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Abstract 


An  Approach  for  the  Adaptive  Solution  of 
Optimization  Problems  Governed  by  Partial 
Differential  Equations  with  Uncertain  Coefficients 

by 

Drew  P.  Kouri 

In  this  thesis,  I  develop  and  analyze  a  general  theoretical  framework  for  optimiza¬ 
tion  problems  governed  by  partial  differential  equations  (PDEs)  with  random  inputs. 
This  theoretical  framework  is  based  on  the  adjoint  calculus  for  computing  derivatives 
of  the  objective  function.  I  develop  an  efficient  discretization  and  numerical  opti¬ 
mization  algorithm  for  the  solution  of  these  PDE  constrained  optimization  problems. 
Using  derivative  based  numerical  optimization  algorithms  to  solve  these  PDE  con¬ 
strained  optimization  problems  is  computationally  expensive  clue  to  the  large  number 
of  PDE  solves  required  at  each  iteration.  I  present  a  stochastic  collocation  discretiza¬ 
tion  for  these  PDE  constrained  optimization  problems  and  prove  the  convergence  of 
this  discretization  method  for  a  specific  class  of  problems. 

The  stochastic  collocation  discretization  technique  described  here  requires  many 
decoupled  PDE  solves  to  compute  gradient  and  Hessian  information.  I  develop  a  novel 
optimization  theoretic  framework  based  on  dimension  adaptive  sparse  grid  quadrature 
to  reduce  the  total  number  of  PDE  solves.  My  adaptive  framework  employs  basic 
or  retrospective  trust  regions  to  manage  the  adapted  stochastic  collocation  models. 
In  addition,  I  prove  global  first  order  convergence  of  the  retrospective  trust  region 


Ill 


algorithm  under  weakened  assumptions  on  the  modeled  gradients.  In  fact,  if  one  can 
bound  the  error  between  actual  and  modeled  gradients  using  reliable  and  efficient 
a  posteriori  error  estimators,  then  the  global  convergence  of  the  retrospective  trust 
region  algorithm  follows. 

Finally,  I  describe  a  high  performance  implementation  of  my  adaptive  colloca¬ 
tion  and  trust  region  framework.  This  framework  can  be  efficiently  implemented  in 
the  C++  programming  language  using  the  Message  Passing  Interface  (MPI).  Due 
to  the  large  number  of  PDE  solves  required  for  derivative  computations,  it  is  essen¬ 
tial  to  choose  inexpensive  approximate  models  and  appropriate  large-scale  nonlinear 
programming  techniques  throughout  the  optimization  routine  to  obtain  an  efficient 
algorithm.  Numerical  results  for  the  adaptive  solution  of  these  optimization  problems 
are  presented. 
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Introduction 


With  the  advances  in  computational  power  and  efficient  numerical  simulation  of  com¬ 
plex  physical  systems,  simulation  based  optimization  and  uncertainty  quantification 
are  becoming  increasingly  feasible.  The  ability  to  simulate  complex  behaviors  of 
physical  systems  gives  engineers  and  experimental  scientists  the  ability  to  make  pre¬ 
dictions  and  hypotheses  about  certain  outputs  of  interest.  These  simulations  are 
critical  for  scientists  of  many  fields  such  as:  fluid  dynamics,  heat  transfer,  chemi¬ 
cally  reacting  systems,  oil  field  research,  nonproliferation  seismology,  monitoring  of 
C02  output,  radiation  transport,  climate  science,  and  structural  mechanics.  With 
these  computational  and  mathematical  forward  models  come  uncertainty  concern¬ 
ing  computed  quantities  [83,  84].  In  physical  modeling  and  numerical  simulation, 
uncertainty  arises  from  lack  of  knowledge  corresponding  to  model  physics  and  as¬ 
sumptions  (epistemic  uncertainty)  and  inherent  variability  in  the  model  parameters 
(aleatory  uncertainty).  Epistemic  uncertainty  can  be  reduced  by  increased  knowledge 
and  is  generally  not  probabilistic  in  nature,  although  one  can,  to  some  extent,  handle 
epistemic  uncertainty  in  the  Bayesian  framework.  On  the  other  hand,  aleatory  un¬ 
certainty  is  typically  characterized  by  assigning  probability  distributions  to  uncertain 
or  random  parameters.  The  quantification  of  aleatory  uncertainty  is  performed  by 
tracking  these  probability  distributions  through  the  forward  simulations. 


1 


2 


This  work  is  focused  on  the  treatment  of  aleatory  uncertainty.  Common  tech¬ 
niques  for  quantifying  aleatory  uncertainty  are  intrusive  and  non-intrusive  expansion 
methods  [46] .  Intrusive  methods  include  local  and  global  polynomial  chaos  expansion 
methods  [42,  10,  TO].  Such  methods  seek  a  global  or  local  polynomial  representation 
of  the  outputs  of  interest.  The  name  intrusive  refers  to  the  fact  that  these  methods  are 
typically  not  “black  box;”  that  is,  numerical  implementation  of  such  methods  requires 
changes  to  black  box  forward  simulation  code  [83].  Moreover,  intrusive  methods  typ¬ 
ically  result  in  large  coupled  systems  to  be  solved.  Common  non-intrusive  methods 
are  sample  based  collocation  methods  such  as  (quasi-)Monte  Carlo  and  stochastic  col¬ 
location  [121,  123,  9].  Non-intrusive  methods  result  in  many  decoupled  deterministic 
forward  simulations  at  random  or  structured  collocation  points. 

The  stochastic  collocation  method  seeks  an  interpolated  polynomial  representa¬ 
tion  of  the  random  field  output  of  interest.  Furthermore,  provided  sufficient  regu¬ 
larity  of  the  output  of  interest,  stochastic  collocation  enjoys  rapid  convergence  as  a 
discretization  scheme.  This  is  contrary  to  Monte  Carlo  methods.  Monte  Carlo  meth¬ 
ods  converge  in  probability  at  a  rate  of  1  / \/Q  where  Q  is  the  number  of  random 
samples.  This  means,  one  requires  a  large  number  of  samples  to  decrease  the  error 
in  simulation.  Aside  from  the  slow  convergence  rate,  Monte  Carlo  avoids  the  curse 
of  dimensionality,  i.e.  the  convergence  rate  is  independent  of  the  dimension  of  the 
problem.  Stochastic  collocation  on  the  other  hand  exploits  regularity  of  the  output  of 
interest  to  speed  convergence  [9].  Unlike  Monte  Carlo  methods,  stochastic  collocation 
does  not  thwart  the  curse  of  dimensionality,  although  the  dependence  of  the  conver¬ 
gence  rate  on  the  stochastic  dimension  can  be  reduced  by  employing  sparse  grids 
[52,  87,  88,  15,  119,  32,  89,  92,  93].  Sparse  grids  were  first  introduced  by  Smolyak  in 
1963  [110]  and  present  an  efficient  means  of  approximating  tensor  product  problems. 
Sparse  grids  and  stochastic  collocation  converge  rapidly  for  sufficiently  smooth  prob¬ 
lems  and  are  vastly  superior  to  Monte  Carlo  methods  for  problems  with  a  modest 
number  of  random  variables. 
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Forward  simulation  and  uncertainty  quantification  are  typically  not  the  only  goals 
of  experimentalists.  In  many  applications,  these  forward  simulation  codes  are  used  to 
aid  in  control  and  design  of  physical  processes.  Also,  these  simulators  can  be  used  to 
infer  parameters  of  the  physical  system.  The  discretization  of  these  problems  results 
in  extremely  large  scale  constrained  optimization  problems  even  when  uncertainty 
is  not  considered.  When  uncertainty  is  added  to  the  forward  problems,  the  scale  of 
the  resulting  optimization  problem  increases  drastically.  Moreover,  the  addition  of 
uncertainty  requires  problem  reformulation.  These  reformulations  result  in  “robust” 
optimization  problems  where  the  goal  is  to  minimize  the  risk  associated  with  a  certain 
design  or  control.  Here,  risk  refers  to  a  measure  of  the  variation  of  the  objective  with 
respect  to  the  random  model  input  data.  These  measures  of  risk  are  studied  in 
depth  in  the  context  of  stochastic  programming.  A  particularly  convenient  and  rich 
class  of  risk  measures  are  the  coherent  risk  measures  [98,  99,  108].  These  measures 
exhibit  properties  desirable  for  optimization,  such  as  convexity  and  monotonicity. 
Reformulating  the  optimization  problem  may  also  result  in  chance  or  probabilistic 
constraints  where  one  requires  that  the  probability  that  certain  outputs  of  interest 
exceed  set  thresholds  be  less  than  a  set  “tolerable”  level  [116].  These  constraints  may 
arise  as  constraints  on  the  probability  of  failure  of  the  optimal  design. 

Risk  measures  and  chance  constraints  add  much  complexity  to  the  analytical  and 
computational  aspects  of  these  optimization  problems.  Naively  applied,  these  ad¬ 
ditions  typically  result  in  a  lack  of  differentiability  for  the  objective  function  and 
constraints  [108].  To  this  end,  traditional  gradient  based  optimization  methods  may 
not  apply.  Due  to  the  extreme  computational  complexity  of  these  problems,  it  is 
advantageous  to  use  gradient  based  methods  to  generate  reliable  and  efficient  opti¬ 
mization  routines.  In  order  to  apply  gradient  based  methods,  one  may  need  to  choose 
a  suitable  smooth  alternate  to  the  risk  measure  or  chance  constraints,  or  one  may  need 
to  reformulate  the  optimization  problem  by  adding  auxiliary  variables.  Care  must  be 
taken  when  reformulating  these  problems  and  discretizing  using  stochastic  collocation 
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on  sparse  grids,  which  produce  negative  quadrature  weights,  as  these  reformulations 
may  lead  to  inconsistencies,  i.e.  loss  of  convexity  of  the  objective  or  constraints.  The 
current  treatment  of  risk  measures  and  chance  constraints  typically  involves  the  use 
of  Monte  Carlo  methods  and  to  my  knowledge  has  not  been  tackled  using  polynomial 
chaos  or  stochastic  collocation  methods. 

Aside  from  problem  formulation  issues,  optimization  problems  governed  by  un¬ 
certain  forward  models  require  possibly  many  simulations  of  the  forward  model  per 
gradient  based  optimization  iteration.  Furthermore,  gradient  based  methods  require 
some  control  over  the  forward  simulation  code  in  order  to  compute  adjoint  states 
and  gradients.  Finite  differences  can  be  employed  to  avoid  modifying  black  box  for¬ 
ward  simulation  code,  but  finite  difference  computations  become  prohibitive  for  large 
scale  optimization  problems  as  each  gradient  computation  requires  multiple  forward 
simulations.  I  consider  adjoint  based  optimization  because  most  models  used  in  en¬ 
gineering  lend  themselves  to  adjoint  computations.  The  adjoint  approach  yields  an 
efficient  and  accurate  gradient  computation  at  the  expense  of  destroying  the  black 
box  quality  of  the  forward  simulator.  When  non-intrusive  uncertainty  quantification 
methods  are  used  with  the  adjoint  approach,  the  forward  and  adjoint  problems  at 
each  sample  can  be  solved  in  parallel  using  the  deterministic  solvers,  i.e.  no  new  code 
needs  to  be  generated  to  accommodate  uncertainty  when  computing  gradients  [67]. 

Even  the  use  of  adjoint  based  gradient  computation  does  not  solve  the  problem  of 
algorithmic  and  computational  inefficiency.  As  stated  above,  adjoint  based  computa¬ 
tions  require  many  forward  simulations  with  non-intrusive  methods  or,  in  the  case  of 
intrusive  methods,  the  solution  of  many  large  coupled  systems  of  equations.  Hence, 
optimization  quickly  becomes  prohibitive.  To  circumvent  this,  I  propose  to  use  adap¬ 
tive  uncertainty  quantification  methods  to  solving  these  optimization  problems  under 
uncertainty.  Optimization  is  an  ideal  setting  for  adaptivity  as  accuracy  in  model  sim¬ 
ulation  is  only  essential  when  close  to  the  minimizer.  Moreover,  optimization  gives  a 
natural  metric  to  guide  adaptivity.  I  have  developed  an  adaptive  sparse  grid  stochastic 
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collocation  framework  to  solve  these  optimization  problems.  My  adaptive  approach 
is  based  on  the  trust  region  algorithm  for  unconstrained  optimization,  which  provides 
an  exemplary  optimization  theoretic  framework  for  managing  approximate  models. 
In  this  case,  the  adaptivity  is  driven  according  to  the  size  of  the  gradient.  Therefore, 
as  one  approaches  a  first  order  critical  point,  the  model  is  increasingly  refined.  Fur¬ 
thermore,  it  is  possible  to  extend  the  trust  region  idea  to  some  classes  of  constrained 
problems  in  which  case  the  adaptivity  is  driven  by  the  projected  gradient.  To  this 
end,  I  propose  to  extend  my  adaptive  sparse  grid  collocation  trust  region  framework 
to  handle  more  general  constrained  and  chance  constrained  optimization  problems. 
The  goal  of  this  extension  is  to  develop  efficient  algorithms  and  software  to  handle 
challenging  engineering  application  problems. 

1.1  Literature  Review 

My  work  presented  in  this  thesis  lies  in  the  intersection  of  many  mathematical  fields 
such  as  optimization  theory,  probability  theory,  approximation  theory,  and  the  theory 
of  partial  differential  equations  (PDEs).  It  is  my  goal  in  this  section  to  review  some 
relevant  background  material  concerning  PDEs  with  uncertain  coefficients,  sparse 
grids,  optimization  problems  governed  by  PDEs  with  uncertain  coefficients,  trust 
region  methods,  and  adaptive  finite  element  methods. 

1.1.1  PDEs  with  Uncertain  Coefficients 

The  study  of  PDEs  with  uncertain  coefficients  is  a  relatively  new  subject,  but  the 
building  blocks  for  the  modern  analysis  of  such  PDEs  were  formed  in  the  early  twen¬ 
tieth  century  with  Norbert  Wiener’s  1938  development  of  “homogeneous  chaos”  or 
polynomial  chaos  expansion  [122],  This  polynomial  chaos  expansion  provides  a  poly¬ 
nomial  representation  of  Gaussian  random  fields  and  remains  a  popular  numerical 
method  of  solving  PDEs  with  uncertain  coefficients  [65,  72,  113].  Other  contributions 
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to  the  modern  theory  of  PDEs  with  uncertain  coefficients  stems  from  the  indepen¬ 
dent  works  of  Karhunen  [64]  and  Loeve  [71]  who  developed  a  general  Fourier  series 
representation  of  random  fields  known  today  as  the  Karhunen-Loeve  (KL)  expansion. 
More  recently,  polynomial  chaos  and  the  KL  expansion  have  been  employed  to  create 
numerical  methods  for  solving  PDEs  with  uncertain  coefficients. 

The  polynomial  chaos  methods  for  solving  PDEs  with  uncertain  coefficients  has 
received  much  attention  [54,  124,  65].  Coupling  the  polynomial  chaos  method  with 
finite  elements  or  other  numerical  PDE  solution  techniques  gives  a  robust  numerical 
solution  method,  but  suffers  from  the  need  to  compute  with  high  order  global  poly¬ 
nomials.  Babuska,  Tempone,  and  Zouraris  [11]  developed  a  finite  element  scheme 
known  as  stochastic  Galerkin  which  encompasses  polynomial  chaos  and  allows  for 
local  (discontinuous)  polynomial  representations  of  the  random  held  solution.  The 
stochastic  Galerkin  method  is  very  popular  due  to  its  rapid  convergence  rates,  but 
suffers  from  high  computational  costs.  The  stochastic  Galerkin  discretization  results 
in  a  large  coupled  linear  system  which  may  be  expensive  to  solve. 

Another  class  of  methods  known  as  sample  based  or  intrusive  methods  circumvent 
the  need  to  solve  a  large  coupled  linear  system.  This  class  of  methods  contains  Monte 
Carlo,  quasi  Monte  Carlo,  and  stochastic  collocation.  These  methods  lead  to  many 
decoupled  linear  systems  which  may  be  solved  in  parallel.  The  Monte  Carlo  and  quasi 
Monte  Carlo  methods  have  their  roots  in  probability  theory,  whereas  the  collocation 
method  has  its  roots  in  approximation  theory.  The  stochastic  collocation  method 
seeks  to  interpolate  the  random  held  PDE  solution  on  a  set  of  quadrature  nodes 
[123,  121,  86,  8].  Aside  from  being  possibly  more  efficient  than  the  stochastic  Galerkin 
method,  the  stochastic  collocation  method  also  enjoys  similar  convergence  properties 
to  stochastic  Galerkin  [14], 


7 


1.1.2  Sparse  Grids  and  Approximation  Theory 

As  mentioned,  stochastic  collocation  discretization  results  in  a  decoupled  system  of 
PDEs.  The  number  of  PDEs  to  be  solved  is  exactly  the  number  of  nodes  used  for  the 
interpolation  of  the  PDE  solution.  Therefore,  it  is  essential  to  choose  sets  of  nodes 
which  are  small  in  size  and  exhibit  high  polynomial  accuracy.  These  requirements 
warrant  the  use  of  so  called  sparse  grids.  The  sparse  grid  idea  was  developed  in  1963 
by  Smolyak  [111].  Recently,  sparse  grids  have  grown  in  popularity.  As  such,  the 
convergence  of  sparse  quadrature  and  interpolation  rules  is  well  known  [52,  87,  88, 
15,  119,  32,  89,  92,  93].  General  sparse  grid  rules  achieve  similar  orders  of  accuracy 
as  corresponding  tensor  product  rules,  but  have  far  fewer  nodes.  In  fact,  if  the  sparse 
grid  is  based  on  nested  one  dimensional  rules,  then  the  sparse  grid  is  even  sparser 
(i.e.  has  even  fewer  nodes).  Some  common  choices  of  nested  rules  are  the  Clenshaw- 
Curtis  rule  for  uniformly  distributed  random  variables  [40],  or  any  Gauss-Patterson 
rule  [91,  51].  Recently,  many  researchers  have  investigated  adaptive  and  optimized 
sparse  grids  [53,  55,  56],  where  the  goal  is  to  choose  a  sparse  grid  rule  which  is  both 
optimal  in  accuracy  and  number  of  nodes  given  a  certain  class  of  functions. 

1.1.3  PDE  Constrained  Optimization  Under  Uncertainty 

The  subject  of  optimization  problems  governed  by  PDEs  with  uncertain  coefficients 
lies  at  the  interface  of  PDEs  with  uncertain  coefficients,  optimization  in  Banach 
spaces,  and  stochastic  programming.  Stochastic  programming  offers  many  numer¬ 
ical  schemes  for  solving  problems  with  uncertainty,  such  as  the  sample  average  ap¬ 
proximation  (SAA)  and  the  stochastic  approximation  algorithm  (SA)  [90,  108,  75]. 
These  methods  are  Monte  Carlo  based  methods  and  thus,  not  central  to  this  the¬ 
sis.  Stochastic  programming  also  gives  the  framework  for  dealing  with  probabilistic 
(or  chance)  constraints,  as  well  as,  develops  the  theory  of  coherent  risk  measures 
[100,  98,  116,  115,  114],  Both  probabilistic  constraints  and  coherent  risk  measures 
are  vastly  important  to  the  topics  of  this  thesis. 


Combining  ideas  from  stochastic  programming  and  PDE  constrained  optimization 
[63]  gives  the  rich  theory  of  PDE  optimization  under  uncertainty.  Although  there  has 
been  much  work  in  the  held  of  statistical  inverse  problems  and  signal  recovery  [74,  76, 
118,  22],  very  few  researchers  actually  consider  optimal  control  of  uncertain  PDEs. 
The  few  works  that  have  considered  such  control  problems  lack  a  complete  theoretical 
framework  and  in  some  situations,  do  not  seek  necessarily  optimal  solutions  [104,  25, 
105,  27,  24,  26].  One  common  approach  in  these  sources  solves  the  control  problem  as 
a  sequence  of  deterministic  problems  defined  at  the  collocation  points.  The  controls 
are  then  taken  as  the  expected  value  of  the  computed  controls.  Controls  generated 
in  this  way  are  not  necessarily  optimal.  Thus,  there  is  a  strong  need  for  a  unified 
understanding  and  theory  of  PDE  optimization  under  uncertainty.  Some  of  this 
theory  can  be  found  in  [67]. 

1.1.4  Trust  Regions  and  Frameworks  for  Adaptivity 

As  hinted  at  earlier,  there  is  great  need  for  efficient  and  reliable  numerical  methods  for 
the  solution  of  optimization  problems  governed  by  PDEs  with  uncertain  coefficients. 
The  goal  of  such  methods  should  be  to  solve  optimization  problems  using  inexpensive 
approximate  models  whenever  possible.  This  framework  is  known  as  model  manage¬ 
ment  and  is  an  inexpensive  and  efficient  solution  method  for  optimization  problems 
governed  by  PDEs.  Typically,  model  management  frameworks  are  based  on  trust 
regions  due  to  their  flexibility  and  provable  global  convergence  [2,  3,  43,  44],  In  fact, 
one  can  prove  convergence  of  trust  region  methods  with  minimal  conditions  on  func¬ 
tion  and  gradient  exactness  [36,  81,  96,  41,  60].  Such  flexibility  is  ideal  for  model 
adaptation.  This  idea  of  model  adaptive  trust  regions  has  recently  been  exploited  in 
the  context  SQP  methods  in  [127].  Furthermore,  in  [16],  this  idea  has  been  applied 
to  the  solution  of  stochastic  programs  using  Monte  Carlo  sampling  techniques.  In 
this  thesis,  I  consider  a  novel  trust  region  approach  known  as  the  retrospective  trust 
region  method  [17].  The  retrospective  trust  region  updates  the  trust  region  radius 
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following  model  updates  (as  opposed  to  before).  As  such,  the  trust  region  radius  is 
updated  to  the  new  model  rather  than  the  old  model.  This  modification  may  decrease 
the  possibility  of  prohibitively  small  trust  region  radii. 

The  quality  of  these  adaptive  frameworks  is  contingent  on  the  quality  of  error 
estimators  used  for  adaptation.  In  the  case  of  optimization  problems  governed  by 
PDEs  with  uncertain  coefficients,  there  are  three  possible  sources  of  error  to  be  con¬ 
trolled.  First  of  all,  the  spatial  PDE  discretization  can  be  adaptively  controlled 
using  adaptive  finite  elements  based  on  well  known  a  posteriori  error  indicators 
[33,  34,  95,  18,  19,  20,  61,  62],  Another  source  of  error  in  the  stochastic  collocation 
finite  element  solution  arises  from  the  quadrature  rule  used  in  defining  the  collocation 
space.  Few  attempts  have  been  made  to  control  this  error  adaptively  [53,  73,  1]  and 
it  would  be  desirable  to  improve  these  estimates.  A  final  possible  source  of  error  is  in 
model  order  reduction  in  the  case  of  time  dependent  PDE  constraints  [21,  57,  4],  Al¬ 
though  model  reduction  is  not  the  main  topic  of  this  thesis,  the  trust  region  framework 
presented  here  is  valid  for  adaptive  model  reduction  in  PDE  constrained  optimization. 

1.2  Thesis  Outline 

In  this  thesis,  I  will  first  formulate  a  general  form  for  optimization  problems  governed 
by  PDEs  with  uncertain  coefficients.  In  this  formulation,  I  will  give  assumptions 
on  the  objective  function  and  PDE  constraint  that  guarantee  well-posedness  of  the 
optimization  problem.  Furthermore,  I  will  introduce  a  model  problem  for  this  thesis 
which  corresponds  to  the  distributed  control  of  an  elliptic  PDE.  Next,  I  will  develop 
the  stochastic  collocation  method  for  solving  PDEs  with  uncertain  coefficients  and 
extend  the  collocation  method  to  the  case  of  optimization.  Additionally  I  will  prove 
an  a  priori  error  bound  for  a  class  of  control  problems  solved  using  the  sparse  grid 
stochastic  collocation  finite  element  method.  Following  this  collocation  discussion,  I 
will  develop  general  sparse  grids  for  high  dimensional  interpolation  and  quadrature. 
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In  my  discussion  of  sparse  grids,  I  prove  new  interpolation  and  approximation  results 
for  general  sparse  grid  operators.  Subsequently,  I  will  present  a  novel  model  adap¬ 
tive  trust  region  approach  to  solving  optimization  problems  governed  by  PDEs  with 
uncertain  coefficients.  Here,  I  analyze  two  trust  region  frameworks:  the  basic  trust 
region  algorithm  and  the  retrospective  trust  region  algorithm.  For  the  basic  trust 
region  algorithm,  my  model  adaptive  framework  is  provably  global  convergent  due 
to  results  found  in  [60].  On  the  other  hand,  I  prove  the  global  convergence  of  the 
retrospective  trust  algorithm  under  a  weakened  gradient  conditions.  Finally,  I  will 
provide  a  brief  discussion  of  implementation  details  and  present  numerical  results. 


Chapter  2 


Optimization  Under  Uncertainty 


Uncertainty  is  present  in  nearly  every  physical  system  and  in  many  engineering  ap¬ 
plications  the  risk-averse  optimization  of  such  systems  is  crucial.  In  the  literature, 
the  concept  of  risk  is  predominately  applied  to  the  optimization  of  financial  portfo¬ 
lios.  My  goal  is  to  extended  the  concept  of  risk-averse  optimization  to  engineering 
and  science  applications.  For  example,  in  engineering  design  and  control  problems, 
risk-averse  optimization  can  be  used  as  a  certificate  of  reliability.  In  this  chapter, 
I  will  present  a  general  formulation  of  the  risk-averse  optimization  problem.  I  will 
develop  the  test  problem  that  I  will  consider  throughout  this  thesis.  In  the  determin¬ 
istic  setting,  this  test  problem  is  a  quadratic  program  posed  in  a  Banach  space.  An 
instance  of  this  test  problem  is  the  quadratic  optimal  control  of  linear  elliptic  partial 
differential  equations  (PDEs)  with  uncertain  coefficients.  This  test  problem  is  very 
insightful  and  sheds  light  on  a  general  formulation  for  these  optimization  problems. 
Following  the  test  problem  I  will  formulate  the  general  problem  setting  in  which  I 
will  study  these  risk-averse  optimization  problems  and  develop  my  adaptive  stochastic 
collocation  and  trust  region  framework.  Moreover,  I  will  formulate  assumptions  on 
the  general  optimization  problem.  These  assumptions  will  be  used  to  ensure  existence 
and  uniqueness  of  optimal  solutions  as  well  as  ensure  that  the  stochastic  collocation 
method  is  an  applicable  discretization  technique.  I  will  then  derive  the  adjoint  calcu¬ 
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lus  for  computing  derivatives  of  the  risk-averse  objective  function  and  present  some 
standard  results  concerning  tensor  products  of  Banach  spaces. 

2.1  Test  Problem 

Let  7 '~i  and  Z  be  Hilbert  spaces,  and  let  V  and  W  be  Banach  spaces.  1  will  begin  this 
chapter  by  presenting  the  archetypal  test  problem  used  throughout  this  thesis.  This 
optimization  problem  is  an  extension  of  the  deterministic  quadratic  program 

min  j(y,z)  :=  h\Qv  -  q\\2n  +  |||z||| 
vev,zez  z  z  (2  11) 

subject  to  Av  +  +  b  =  0, 

where  Q  G  £(V,7d)  is  an  observation  operator,  q  G  H  is  the  desired  state  of  the 
system,  A  G  £(V,  W)  is  the  state  operator,  B  G  C(Z,  W)  is  the  control  operator, 
and  b  G  W  is  an  inhomogeneity. 

2.1.1  Optimal  Control  of  PDEs 

An  instance  of  (2.1.1)  of  particular  interest  to  this  thesis  is  the  optimal  control  of  the 
linear  elliptic  PDE 


—V  •  (e(x)Vu(x))  =  z(x),  x  G  D  (2-1.2) 

v(x)  =0,  x  G  d D, 

where  D  C  Wl  for  d  —  1,  2,  3  denotes  the  physical  domain.  If  v  is  a  solution  to  (2.1.2) 
and  w  is  a  sufficiently  regular  test  function  such  that  w(x)  =  0  for  all  x  G  d D,  then 
integration  by  parts  results  in  the  variational  problem: 

Given  z  G Z  C  i/“1(Zl),  fold  v  G  V  :=  Hq(D)  such  that 

/  e(x)Vv(x)  ■  Vw(x)dx  =  /  z(x)w(x) dx  V  w  G  V. 

J  D  J  D 


(2.1.3) 
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For  simplicity,  let  Z  =  L2(D).  Defining  a:VxV->M  and  b  :  V  x  Z  — >  M  as 

a(v,w)  /  e(x)Vv(x)  ■  Vw(x)dx  and  b(w;  z)  /  z(x)w(x) dx,  (2.1.4) 
J  d  J  D 

the  variational  problem,  (2.1.3),  can  be  written  in  the  equivalent  form: 

Given  z  G  Z,  find  v  G  V  such  that  a(v,  w)  =  b(w,  z)  V  w  G  V. 

Furthermore,  assuming  e  G  L°°(D)  is  bounded  away  from  zero  almost  everywhere  in 
D,  i.e. 

3  emin  >  0  such  that  e(x)  >  emin  a.e.  in  D, 

then  a  is  a  coercive  and  continuous  bilinear  form  and  hence  the  Lax-Milgram  Theorem 
(Theorem  2.7.7  in  [29])  ensures  the  existence  of  a  unique  v(z)  —  v  e  V  which  solves 
(2.1.3). 

Now,  since  a(v,  •)  G  V*  for  all  »  G  V  and  the  mapping  v  G  V  h- >  a(v,  •)  G  V*  is 
continuous  and  linear,  there  exists  a  bounded  linear  operator  A  G  £(V,  V*)  such  that 

a(v,w)  —  { Av,w)v*y  Vr,wG  V, 

where  (•,  -)v*,v  denotes  the  duality  pairing  on  V.  For  a  more  thorough  discourse  on 
the  existence  of  A  see  Chapter  1.3  in  [63].  Similar  arguments  prove  the  existence  of 
the  bounded  linear  operator  B  G  C(Z,V*)  satisfying 

b(w;z )  =  —  (Bz,  w)v*,v  V  z  G  Z,  w  G  V. 

Therefore,  the  PDE  (2.1.3)  has  the  same  form  as  the  linear  constraint  in  (2.1.1). 
The  Lax-Milgram  Theorem  ensures  the  invertibility  of  the  operator,  A,  and  thus  the 
solution  to  (2.1.3)  has  the  specific  form 

v(z)  =  -A  xBz  =  Sz 

where  S  G  C{Z,  V)  denotes  the  solution  operator.  Note  that  the  solution  v(z)  =  v  G  V 
of  (2.1.3)  depends  linearly  on  the  control  variable,  z  G  Z. 
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Now  for  any  real  Hilbert  space,  7 1  3  V,  the  optimal  control  problem  is 

min  j(v,z )  :=  -lln  —  v\\%,  +  —  \\z\\2z 
vev.zez  ’  ’  2  11  llw  2 11  112 

subject  to  Av  +  B;  =  0. 


(2.1.5) 


Relating  (2.1.5)  to  (2.1.1),  W  =  V*,  Q  is  the  identity  operator,  and  v  =  q.  To  make 
this  concrete,  I  will  set  7i  =  L2(D).  Furthermore,  the  assumptions  on  e  permit  the 
reformulation  of  (2.1.5)  as  the  implicitly  constrained  optimization  problem  [59] 

min  j(z)  :=j(v(z),z)  =  h\Sz  -  v\\2n  +  °^\\z\\2z.  (2.1.6) 

zez  z  z 


2.1.2  PDEs  with  Random  Inputs 

Similar  analysis  as  above  holds  when  e  in  (2.1.3)  is  replaced  by  a  random  held  (i.e. 
a  family  of  coefficients,  e,  indexed  by  a  random  variable).  Let  denote  a 

complete  probability  space,  fl  is  the  set  of  outcomes,  T  C  2n  is  a  cr-algebra  of  events, 
and  P  :  T  — >  [0, 1]  is  a  probability  measure.  The  random  held  coefficient  is  defined 
as  e  :  11  x  D  — >  M  where  the  map  wefin  e(u,  •)  is  P-measurable.  Furthermore,  to 
extend  the  existence  and  uniqueness  result  from  the  deterministic  case,  it  suffices  to 
assume  that  e  G  L°°(Q  x  D)  is  bounded  away  from  zero  almost  everywhere  in  Q  x  D, 
i.e. 

3  emin  >  0  such  that  e(co,x)  >  emin  a.e.  in  fl  x  D.  (2-1-7) 

For  weaker  assumptions  on  e  see  Section  1  of  [9].  Substituting  this  random  held 
coefficient  into  (2.1.3)  gives  rise  to  the  PDE 

—V  •  (e(u;,  x) V«(w,  x))  =  z(x)  x  G  D,  a.e.  in  fl  (2.1.8) 

u( u,x)  =  0  x  G  dD,  a.e.  in  f 1. 

As  indicated  by  the  notation,  u(u,x),  the  solution  to  the  state  equation,  (2.1.8),  is 
also  a  random  held.  Moreover,  for  almost  all  w  G  fi,  the  solution  to  (2.1.8)  satisfies 
u( uj)  G  V. 
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In  order  to  discuss  the  weak  form  of  (2.1.8),  I  will  require  the  notion  of  a  Bochner 
space.  Bochner  spaces  are  formal  extensions  of  the  Lebesgue  spaces,  LPP(Q),  for 
functions  which  output  into  general  Banach  spaces  [125].  The  Bochner  space  Lpp{Vt\  V) 
is  defined  as 

Lp(h2;  V)  :=  :  fl  — >  V  :  v  strongly  measurable,  J  ||u(u;)||ycLP(c<;)  <  +cx) 

for  p  G  [1,  cx))  and 


Lp  (Q;  V)  :=  {v  :  — >  V  :  v  strongly  measurable,  ess-supwgn  ||u(cu)||v  <  +oo} 


for  p  =  oo.  Returning  to  (2.1.8),  assumption  (2.1.7)  and  the  Lax-Milgram  Theorem 
ensure  the  existence  of  a  unique  solution,  u  G  Lp(12;  V),  which  solves  the  variational 
problem: 


Given  z  G  Z,  find  u  G  Lp(h2;  V)  such  that 

e(u>,  x)X7u(lv,  x)  ■  Vw(lo,  x)dxdP(cv) 

z(x)w(u,  x)dxdP(uj)  Vw  £  Lp(Q;V). 


n  J  d 


(2.1.9) 


n  J  d 


2. 1.2.1  The  Finite  Noise  Assumption 

To  facilitate  the  numerical  solution  of  (2.1.3),  I  will  work  under  the  finite  noise 
assumption.  Assume  there  exists  a  finite  vector  of  M  independent  random  vari¬ 
ables  Y  :  Q  — >  T  with  T  :=  Tj  x  ...  x  Tm  with  T&  C  M  for  k  =  1, . . .  ,M  such 
that  e(u,  •)  =  e(T'(Ct;))')-  Furthermore,  suppose  that  each  random  variable  for 
k  =  1  ,...,M  has  Lebesgue  density  pk  :  Tfc  — >  [0,  oo]  and  the  vector  Y  has  joint 
density  p(y)  —  pi(yi)  ■  . . .  ■  Pm(vm)-  bi  this  case,  the  probability  measure  dP(u>)  can 
be  replaced  by  the  weighted  Lebesgue  measure  p(y)dy.  Additionally,  this  assumption 
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permits  the  change  of  variables  in  (2.1.9): 


Given  z  G  Z,  find  u  G  L2p{T ;  V)such  that 
f  P(y )  /  e(y,a:)Vu(y,i)  •  Vw(r/,  x)dxdr/ 


ID 


=  /  P(2/)  /  z(x)w(y,  x)dxdy  \/w  G  I^(r ;  V) 


id 


(2.1.10) 


where  the  p-weighted  Bochner  spaces  are  defined  analogously  as 

Lpp{T;  V)  :=  :  T  — >  V  :  v  strongly  measurable,  J  p(y)\\v(y)\\l,dy  <  +oo 

for  p  G  [1,  ex:)  and 


L“(T;  V)  :=  {v  :  T  — >  V  :  v  strongly  measurable,  ess-sup^gp  p(y)\\v(y)\\v  <  +c»}  . 

for  p  —  oo.  If  V*  is  separable,  then  for  p  G  (1,  oo) ,  the  topological  dual  spaces 
corresponding  to  LPp(T ;  V)  is  isometrically  isomorphic  with  Lp*  (r;  V*)  where  =  1. 
For  p  =  1,  the  topological  dual  of  V)  is  isometrically  isomorphic  to  L^(Y ;  V*), 
but  the  same  is  not  true  for  p  =  oo.  In  this  case,  L^°(r;  V)*  D  L^T;  V*)  [37,  125,  126]. 
The  uniform  cllipticity  (2.1.7)  for  (2.1.3)  can  be  restated  in  the  image  space  T  as 


3  emin  >  0  such  that  e(y,x)  >  emin  a.e.  in  rxh.  (2.1.11) 


This  condition  ensures  existence  and  uniqueness  of  solutions  to  the  variational  prob¬ 
lem  (2.1.10),  but  also  implies  additional  regularity  of  the  solution  with  respect  to 
y  G  T.  Assumption  2.1.11  can  be  used  to  prove  the  continuity  result 


\\u(y)\\m(D)  <  C\\z\\z  a.e.  in  T 

where  C  is  some  positive  constant  independent  of  y  G  T  (see  [9]).  Since  the  right 
hand  side  is  independent  of  y  G  T,  this  inequality  implies  u  G  L^°(T;V). 


2. 1.2. 2  The  Weak  Form 

Define  the  bilinear  form  a:VxVxT->Ias 

a{v,w\y)  =  [  e(y,x)Vv(x)  ■  Vw(x)dx, 


ID 
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the  linear  form  b  :  V  x  Z  — >  M  as  in  (2.1.4),  and  the  expected  value  operator, 
E  :  Lj(T)  -»•  R,  as 

E[X]  =  Jp(y)X(y)dy. 

Then,  the  weak  form,  (2.1.9),  can  be  equivalently  written  as: 


Given  z  E  Z,  find  u  E  L2(T',  V)  such  that 
E[a(u,w,  •)]  =  E[b(w;  z)]  L2(  T;  V). 

Throughout  this  thesis  the  weak  form  of  the  state  equation  will  be  denoted 
e  :  L2(T ;  V)x7->  Z^(T;  V)*  which,  in  this  case,  is  defined  as 


(2.1.12) 


e(u, z)  := 


a(u(y),  -;y)  -b(-;z 


The  uniform  ellipticity  assumption  (2.1.11)  and  the  Lax-Milgram  theorem  ensures 
the  existence  of  a  unique  solution  to  the  variational  problem  (2.1.12),  or  equivalently 


Given  z  E  Z,  find  w  E  L2{Y\V)  such  that 
(e(w,z),v)L 2(r;v)*,L2(r;v)  =0  Vv  E  L2(T;V). 


(2.1.13) 


Similar  to  the  discussion  concerning  the  deterministic  problem,  there  exist  bounded 
linear  operators  A  E  C(L2(T;  V),  L2(T-,  V)*)  and  B  E  C(Z ,  L2(T\  V)*)  defined  by 


(A u,w)L2{r.vytL2(r.v)=  p(y)a(u(y),w(y);y)dy  V  u,w  E  Tj(T;V)  (2.1.14a) 


and 


(B z,w)L*(r;vy,L*(rv)  =  ~  p(y)Kw(y)>z) dV  V  w  E  Lj(T;V),  z  E  Z,  (2.1.14b) 


such  that  (2.1.13)  has  the  form 


A  u  +  B^  =  0, 


The  Lax-Milgram  theorem  ensures  the  invertibility  of  A  and  therefore  the  solution 
to  (2.1.13)  can  be  written  as 

u(z)  =  — A_1B^  =  Sz 

where  S  E  jC(Z ,  L2(T ;V))  denotes  the  solution  operator.  As  above,  the  state,  u(z), 
depends  linearly  on  the  control  variable,  z  E  Z. 
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2. 1.2. 3  The  Parametric  Weak  Form 

For  the  numerical  solution  of  (2.1.12)  it  can  be  favorable  to  consider  the  following  the 
parametrized  variational  problem: 

Given  z  E  Z,  find  u  :  T  — >  V  such  that 

a(u(y) ,  w,  y)  —  b(w;  z)  a.e.  in  T,  V  w  E  V.  (2.1.15) 

To  be  concise,  I  will  use  the  notation  e :V  x  Z  xT  — >  V*  where 

e(u(y),  z;  y)  =  a(u(y ),  ■ ;  y)  -  b(  • ;  z)  =  0 

to  denote  the  state  equation.  Assumption  (2.1.11)  and  the  Lax-Milgram  theorem 
ensure  the  existence  of  a  point-wise  almost  everywhere  defined  function,  u  :  T  — >  V, 
which  solves  the  variational  problem  (2.1.15),  or  equivalently 

Given  z  6  Z,  find  u  :  T  — >•  V  such  that 

{e(u(y),z;y),w)v*tv  —  0  a.e.  V  w  G  V.  (2.1.16) 

The  continuity  bound  on  the  solution  of  (2.1.16)  implies  u  G  L£°(T ;  V)  and  finiteness  of 
the  probability  measure  p(y)dy  ensures  that  u  G  Lpp{T ;  V)  for  any  p  G  [1,  cxd)  U  {cxd}. 
In  this  test  case,  the  solution  u  of  (2.1.16)  is  also  a  solution  of  (2.1.13)  and,  by 
uniqueness,  these  solutions  coincide. 

Define  A (y)  G  £(V,  V*)  and  B (y)  G  C(Z,V*)  for  almost  all  y  G  T  as 

(A (y)v,  m)v*,v  =  a(v ,  w;y)  V  v,  w  G  V  a.e.  in  T 

and 

(B(y)^,  w)v*,v  =  — b(w ;  z)  \/  vj  E  V ,  z  E  Z  a.e.  in  T, 
respectively.  Then,  the  weak  form,  (2.1.15),  gives  rise  to  the  linear  operator  equation 


A (y)u(y)  +  B (y)z  =  0  a.e.  in  T. 
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The  Lax-Milgram  theorem  ensures  the  point-wise  almost  everywhere  invertibility  of 
A  and  therefore  the  solution  to  (2.1.15)  can  be  written  as 

u(y,  z)  =  -A~1(y)B(y)z  =  S (y)z  a.e.  in  T 

where  S (y)  E  C(Z,V)  for  almost  all  y  E  T  denotes  the  solution  operator. 

Now  for  p  E  [  1,  oo)  let 

U  =  LP(T;V)  =  LP(T-1H10(D)) 

The  parameter  p  of  integrability  will  be  chosen  in  the  next  section  depending  on  the 
risk  measure  used  in  the  objective  function  of  the  optimal  control  problem.  One  can 
generalize  the  operators  in  (2.1.14)  as  follows.  Define 

y  =  Lp(T;V*). 

Note  that  y*  =  Lp*  (T;  V)  where  1/p  +  1/p*  =  1.  Let  A  E  C(U ,  y)  and  B  E  C(Z,  y) 
be  defined  by 

(A  u,w)yy*  =  I  p(y)a(u(y),w(y);y)dy  V  u  EU,w  E  y*  (2.1.17a) 

(B-,  w)yy*  =  -  j  p(y)b(w(y);  z)dy  V  z  E  Z,  w  E  y* .  (2.1.17b) 

The  definition  of  A  and  A  implies 

(A U,w)yy*  =  E[(Au,w)v*,v]  V  U  EU,W  E  y*. 

2.1.3  Optimal  Control  of  PDEs  with  Random  Inputs 

Consider  the  optimization  problem  (2.1.5).  When  the  PDE  coefficient,  e,  in  (2.1.3) 
is  a  random  field,  the  solution,  u  E  U,  is  also  a  random  field.  Hence,  the  map 
V  l— ►  j(u(y),z)  :  T  — >  R.  for  fixed  z  E  Z  is  a  random  variable.  This  randomness  in 
the  objective  function  is  generally  handled  using  “risk  measures.”  Risk  measures  are 
operators  which  act  on  spaces  of  function  with  domain  T  and  codomain  M  such  as 
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Lq(T)  for  q  G  [1,  oo)  or  q  =  oo.  Throughout  this  thesis,  I  will  denote  the  risk  measure 
as 

a  :  L%T)  R. 


Assuming  j/gTh  j(u(y),  z )  is  a  function  in  Z/*(T),  the  risk-averse  optimization  prob¬ 
lem  corresponding  to  the  control  of  the  linear  elliptic  PDE  with  uncertain  coefficients, 
(2.1.9),  is 


mm 

u&A,  z&Z 


J(u,z )  ■■=  ^(\\u-v\\^j 


a , 


+ 


(2.1.18) 

subject  to  An  +  =  0. 

Recall  that  for  this  example,  7i  =  Z  =  L2(D),  and  that  A,  B  are  the  operators  defined 
in  (2.1.17).  Since  the  solution  to  the  weak  form  (2.1.9)  satisfies  u  GW  =  I^(T;V), 
the  map 


IMb)  -  ®ll  h  6  Lf(T)  or,  equivalently,  w  ||w  —  u||^  :  U  — >  LPJ2{T). 

Therefore,  any  risk  measure  used  in  (2.1.18)  must  have  domain  LPJ2{T).  Equivalently, 
given  a  risk  measure 

u  :  L%T)  ->  M 

one  requires  the  state  space 


U  =  Lf(T;V)  =  Lf(T;H^(D)). 
The  corresponding  image  space  for  the  operator  (2.1.17)  is 

y  =  L2p\Y-V*)  =  L2q{Y-H~\D)). 
Note  that  jh*  =  L2pq/{2q-l)(T-V)  =  L2q,^2q~l) (T-,  Hq(D)). 


2. 1.3.1  The  Reduced  Space  Formulation  and  Differentiability 

Optimization  problem  (2.1.18)  can  equivalently  be  written  in  the  unconstrained  re¬ 
duced  space  form  as 

nii inJ(z)  :=  <r(j(z-,y))  =  ^a(\\u(z)  -  vf^j  +  |||^||| 
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where  u  =  u(z )  G  U  solves  A u  +  =  0.  One  can  employ  the  adjoint  calculus 

to  compute  derivatives  of  J(z).  Notice  that  the  derivative  of  J(z)  requires  multiple 
applications  of  the  chain  rule:  the  derivative  of  the  risk  measure,  the  derivative  of 
the  Tf-norm,  and  the  derivative  of  the  solution  to  the  state  equation.  From  the 
deterministic  optimization  problem,  the  objective  function, 

*/  \  In  — m2  O-  1 1  o 

3{v,z)  =  -\\v-v\\n  +  -\\z\\z, 

is  continuously  Frechet  differentiable  with  respect  to  both  v  G  V  and  z  G  Z.  For 
general  functions  /  :  V  — >  M  which  are  Frechet  differentiable,  it  is  unclear  whether  or 
not  the  mapping  u  G  LP{T\V)  i— »  f(u)  is  also  Frechet  differentiable.  The  quadratic 
objective  function,  j,  is  an  example  where  Frechet  derivatives  in  U  =  Lpp(T ;  V),  p  —  2 q, 
are  explicitly  computable.  In  fact,  there  exists  a  positive  constant  C  >  0  such  that 

/I  i 

P(y)  2  HM(^)  “  ^  +  h(y)\\n  ~  2  ~  v\\2h  -  (u(y)  -  v,  h(y))n  d y 

=  ^p(y)\\Hy)\\2Hdy  <  <  c'll^ll^^ ;V)- 

This  proves  that  for  hxed  z  G  Z  the  mapping  u  G  U  t— j(u,  z)  is  Frechet  differentiable 
and  the  derivative  ju(u,z)  G  C{U,Lq(T))  for  hxed  z  G  Z.  This  result  is  extended  to 
more  general  functions  in  the  following  proposition. 

Proposition  2.1.1  Let  f  :  V  — >  M  be  Frechet  differentiable  arid  define  /f  :  VxV->R 
by  the  relationship 

I  f(v  +  h)  -  f(v)  -  f\v)h\  =  p(v,  h)\\h\\v.  (2.1.19) 

Consider  u  G  LP{T]  V)  with  p  G  (1,  oo)  or  p  =  oo  and  assume  the  map 

I/GTk  f(u{y))  G  Lqp(T) 

for  some  q  G  [l,p).  Then  f  :  LP{T ;  V)  — »•  Lq{T)  is  Frechet  differentiable  at  u  if 

lim  \\P(u,h)\\Ls(T)  =  0,  s:=  -3-  >  0. 

II  ^11  (r; v)  P  Q 


(2.1.20) 
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Proof:  Plugging  u,  h  G  LP(T;  V)  into  (2.1.19),  taking  the  power  q,  and  integrating 
over  T  yields 


p(y)\f(u(y)  +  h(y))-f(u(y))-f(u(y))h(y)\qdy  =  /  p(y)(3(u(y),h(y))q\\h(y)\\qndy. 


Applying  Holder’s  inequality  to  the  right  hand  side  gives 


i/s/  \  <i/p 

'  P(y)\\h\\pndy 


J^p(y)P(u(y),h(y))q\\h(y)\\qndy  <  I  J^p(y)P(u,  h)sdy 

=  ||/3(M,h)|||2(r)||h||^(r;H) 

where  s  satisfies  ^  ^  =  1  (i.e.  s  =  ^).  Thus,  if  (2.1.20)  holds,  then 


lim  ||  f(u  +  h)  -  f(u)  -  f  (u)h\\Li^r)  =  0 
and  /  :  LP(Y :  V)  — ■>  Lq(T)  is  Frechet  differentiable. 


□ 


Remark  2.1.2  For  fixed  z  G  Z,  the  quadratic  objective  function, 

f(u)  =j(u,z)  =  ^\\v-v\\^  +  ||HH, 

satisfies  the  assumptions  of  Proposition  2.1.1  for  p  =  2 q.  Furthermore,  s  =  2q  and 
\\P(vi  h)  lln^r)  =  H^  llLp'5,(r;V)’ 

To  conclude  the  formulation  of  optimization  problem  (2.1.18),  I  will  assume  that 


a  :  Lq(T)  — >  M  for  q  G  [1,  oo). 

By  the  uniform  ellipticity  assumption,  (2.1.11),  u  G  T“(T;  V)  and  one  can  choose  the 
state  space 

U-.=  L2q{V-V). 

Assuming  the  risk  measure,  cr,  is  Hadamard  differentiable,  the  Frechet  differentiability 
of  the  map,  u  ^  \\u  —  h||^,  guarantees  that  the  map, 


u  ^  <j{\\u-v\\2h), 


23 


is  Hadamard  differentiable  as  a  map  from  U  :=  L2q(T\  V)  into  M.  Therefore,  J(u,  z)  = 
|cr(||«  —  I’Wy)  +  f  is  Hadamard  differentiable  with  respect  to  u  E  U  =  L2q(F;  V) 
and  Frechet  differentiable  with  respect  to  z  E  Z. 

2. 1.3.2  The  Adjoint  Calculus 

Recall  a  :  Lq(T)  ->  R  for  q  E  [1,  oo),  V  =  H'(D ),  U  =  L2q(F;  V),  and  y  =  L2q(F ;  V*). 
I  will  derive  the  adjoint  calculus  for  this  optimization  problem  by  differentiating  the 
Lagrangian  functional,  L:WxZxy*->I  given  by 

L(u,  z,  A)  :=  ^<7(||u  -  v\\2h)  +  ||H||  +  (A u  +  Bz,  \)y,y*. 

The  Lagrangian  is  Frechet  differentiable  with  respect  to  both  A  G  y*  and  z  G  Z. 
On  the  other  hand,  L  is  Hadamard  differentiable  with  respect  to  u  G  U  since  the 
risk  measure,  cr,  is  Hadamard  differentiable.  Setting  the  Frechet  derivative  of  L  with 
respect  to  the  Lagrange  multiplier,  A  G  3^*,  to  zero  returns  the  state  equation  (2.1.13). 
Differentiating  L  with  respect  to  the  state  variable  u  Eli  and  setting  the  Hadamard 
derivative  to  zero  results  in  the  adjoint  equation 
d 

0  =  —L(u,  z,  A )5u  =  E[Va(\\u  —  b||^)(w  —  v,  5u)n\  +  (A Su,  A )y,y* 

=  E[Wa[\\u  -  v\\n)(u  -  v,  5u)n]  +  (A*A,  Su)u*,u  (2.1.21) 

where  A*  E  C(y* ,U*).  For  the  linear  elliptic  PDE  (2.1.8),  the  following  relationship 
holds 

(u,A*w)u,u*  =  (A  u,w)yy*  =  E[(  Au,  w)v*,v]  =  E[(u,A*w)vy*] 

for  all  u  Eli,  w  E  y* .  Therefore,  the  adjoint  operators  A  (y)*  E  £(V,  V*)  for  almost 
all  y  E  T  and  A*  E  C(U,U*)  satisfy 

(u,  A *w)u,u*  =  E[(u,  A*w)v,v*]  V  u  EU,w  E  y*. 

Substituting  this  equivalent  parametrized  expression  for  A  in  (2.1.21)  yields 

0  =  E[Va(\\u  —  v\\n)(u  ~  v,  Su)n  +  (A*A,  8u)v*y]. 
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Now,  since  TL  =  L2(D )  C  V*  =  H  1(D),  the  adjoint  equation  can  be  reformulated  as 

0  =  E[Va(\\u  —  u|&)(I(w  —  v),  Su)v*y  +  (A* A,  5u)v*,v] 

=  E[(Va{\\u  —  v\\2n)I(u  —  v)  +  A*A,  hu)v*,v] 

where  I  G  £( 7i,V*)  denotes  the  injection  operator  from  7i  into  V*. 

As  with  the  parametrized  weak  form,  (2.1.15),  it  may  be  beneficial  for  the  numer¬ 
ical  solution  of  (2.3.1)  to  consider  the  parametrized  adjoint  equation 

A(y)*X(y)  +  Vcr(||tt  -  v\\2n)I{u(y)  -  v)  =  0  (2.1.22) 

where  u  EU  =  L2q(Y;V).  Since  a  G  Lq(Y),  it  holds  that  Vcr(||u  —  u||^)  G  LqJ^q~l\ r). 
Together  with  I (u  —  v)  G  L2q(Y ;  V*),  one  obtains 

Vcr(||u  -  v\\2n)I(u  -v)e  L2q^2q-l\Y]  V*). 

Moreover,  the  Lax-Milgram  theorem  ensures  the  existence  of  a  unique  solution,  X(y)  G 
V  for  almost  every  y  G  T,  to  (2.1.22).  Additionally,  the  uniform  cllipticity  assumption 
implies  that  ||A(-)“*||  G  L™(Y).  Hence,  the  solution  of  the  parametrized  form  of  the 
adjoint  equation  (2.1.22)  satisfies  A  G  F  =  L2pq^2q~1^  {Y]  V).  For  this  linear  elliptic 
test  problem,  the  solution  to  the  parametrized  adjoint  equation,  (2.1.22),  and  the 
adjoint  equation,  (2.1.21),  coincide  due  to  uniqueness  of  solutions.  In  its  strong  form, 
the  parametrized  adjoint  equation,  (2.1.22),  corresponds  to  the  PDE 

—V  •  ( e(y ,  x)VX(y,  x ))  +  Vu(||w  —  b|| ^)(u(y,  x)  —  v(x ))  =  0  iGh,  a.e.  in  Y 

X (y,x)  =  0  x  G  d D,  a.e.  in  T. 

Finally,  the  gradient  of  J(z)  can  be  computed  as 

vjfc)  =  ^zL(u(z),z,Kz)) 

where  u(z )  =  u  GW  solves  the  state  equation  (2.1.15)  and  A(^)  =  A  G  y*  solves  the 
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adjoint  equation  (2.1.22).  Thus,  the  gradient  is  given  as  follows 

VJ(z)Sz  =  a(z,5z)z  +  (B<5z,  A)^* 

=  (az,  Sz)z  +  E[(B8z,  A)v*,v] 

=  (az,  8z)z  +  £[(B*A,  5z)z] 

=  (az  +  E[B*\],5z)z 

where  for  a.a.  y  G  T,  B(y)*  G  C(V,Z)  and 

y  ~  B(#ri(}) :  r  —*  c(z,  L«P»-1)(D), 

since  in  our  example  ||B(-)*||  G  L^°(T). 

The  final  equality  above  is  due  to  Fubini’s  theorem  [49].  This  gives  the  following 
expression  for  the  gradient 

WJ(z)  =  az  +  E[  ETA], 

2. 1.3.3  Risk  Measures 

For  demonstration  purposes,  I  will  now  compute  the  derivative  of  J(z)  for  a  class  of 
risk  measures  paying  special  attention  to  the  integrability  requirements  of  the  risk 
measure.  Members  of  the  class  under  consideration  have  the  form 

a(Y)  =  E[Y]+cE[p(Y-E[Y])} 

where  <p  :  M  — >  M  is  assumed  to  be  differentiable.  Particular  members  of  this  class 
of  risk  measures  are  the  “mean  plus  moment”  risk  measures  and,  with  some  care, 
the  derivations  presented  here  apply  to  the  “mean  plus  semi-deviation”  risk  measure. 
Computing  the  derivative  of  a  yields 

E[Va(Y)rj\  =  E[rj]  +  cE[p'(Y  -  E[Y})(V  -  E[rj})} 

=  E[rj\  +  cE[p\Y  -  E[Y])V\  -  cE[p\Y  -  E[Y])\E[rj\ 

=  E[n  +  c(p’{Y  -  E[Y})  -E[p\Y  -  E[Y])])V 
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which  gives  the  gradient  of  a  as 

Vff(F)  =  1  +e(p'(Y  -  E{Y])-E[p'(Y  -  £[K])]). 

As  a  first  example,  I  will  focus  on  the  mean  plus  variance  risk  measure.  In  this  case, 
p(Y)  =  |y2  and  a  has  domain  L2(T).  Consequently,  the  state  space  is  U  =  LAp{Y\  V) 
so  that  for  u  G  U  the  map  y  >  \\u(y)  —  h||^  G  L2(T).  For  this  risk  measure,  p'(Y )  =  Y 
and  the  derivative  of  a  has  the  particularly  simple  form 

Va(Y)  =  1  +  c(Y  -  E[Y])  6  L%T). 

In  the  case  of  mean  plus  semi-deviation,  p(Y)  =  (V]+  =  max{y,  0},  and 
a  :  Lj(r)  — >  M.  In  this  case  the  state  space  is  U  =  L2p{Y\V).  Clearly,  p  is  not 
differentiable  at  Y  =  0,  but  is  continuously  differentiable  everywhere  in  M  \  {0}.  In 
fact,  p'(y)  =  1  if  y  >  0  and  p'(Y)  =  0  if  Y  <  0.  Therefore,  if  Y  E  Lj(r)  and 
y  ^  E[Y ]  almost  everywhere  in  T,  then 

=  1  l  +  c(l-Pr(y  >S[F]))  if  Y>E[Y] 

\  1  -cl'ri:V  >  /■;[>'])  if  Y<E[Y}. 

Clearly  Va(F')  G  ^“(r)  for  all  Y  G  L^iY)  such  that  Y  ^  E\Y ]  almost  everywhere  in 
T  since  |Vcr(y)|  <  1  +  c. 

2.2  General  Problem  Formulation 

Let  V  and  Z  be  reflexive  Banach  spaces,  and  let  W  be  a  Banach  space.  Throughout 
this  thesis,  V  will  denote  the  deterministic  state  space  and  Z  will  denote  the  control 
space.  As  with  the  test  problem,  the  stochastic  program  considered  in  this  thesis  is 
an  extension  of  the  deterministic  equality  constrained  problem 

min  j(v,z )  subject  to  e(v,z)—  0,  (2-2.1) 

t>SV,  z£Z 

where  the  objective  function,  j  :  V  x  Z  — >  M,  corresponds  to  the  “cost”  associated 
with  a  given  state  v  G  V  and  a  given  control  z  G  Z  and  the  equality  constraint, 
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e  :  V  x  Z  — >•  W,  represents  the  governing  dynamics  (PDE  or  system  of  PDEs).  The 
stochastic  variant  of  (2.2.1)  is  generated  by  adding  uncertainty  or  randomness  to  the 
state  operator,  e.  Throughout  this  thesis,  the  control  variable  will  be  deterministic 
and  thus  adding  uncertainty  to  e  induces  randomness  in  the  state  variable.  The 
stochastic  state  space  will  be  denoted  U.  As  seen  in  the  test  problem,  the  uncertainty 
in  the  state  variable  induces  uncertainty  in  the  objective  function  which  is  typically 
handled  with  risk  measures.  The  risk-averse  optimization  problem  corresponding  to 
(2.2.1)  is 

min  cr(j(u,z))  subject  to  e(u,z)  =  0,  (2.2.2) 

where  a  is  a  risk  measure,  and  e  :  U  x  Z  — >  y  for  some  real  Banach  space  y  denotes 
the  stochastic  state  equation.  This  goal  of  this  section  is  to  make  clear  the  functional 
analytic  framework  for  (2.2.2)  by  making  assumptions  on  function  spaces  as  well  as 
the  objective  function  and  state  equation.  The  assumption  presented  here  ensure  well 
posedness  of  (2.2.2). 

2.2.1  The  State  Equation 

To  begin,  1  will  present  assumptions  on  that  state  equation  to  ensure  (2.2.2)  is  well 
posed.  Let  (£2,  JF,  P )  be  a  complete  probability  space  where  £2  is  the  set  of  outcomes, 
T  C  2n  is  a  a-algebra  of  events,  and  P  :  T  — >  [0, 1]  is  a  probability  measure.  When 
uncertainty  is  added  to  the  deterministic  state  operator  e,  I  will  denote  the  stochastic 
variant  as  e(u)  :  V  x  Z  — >  IV  for  almost  every  uj  G  £2.  My  Erst  assumption  is 
a  common  assumption  in  the  literature  and  is  known  as  the  “Finite  Dimensional 
Noise  Assumption.”  Finite  dimensional  noise  assumes  the  operator,  e(u),  has  a  finite 
dependence  on  the  random  variable  u  G  £2.  This  finite  dependence  will  facilitate  the 
numerical  solution  of  the  optimization  problem  (2.2.2). 


Assumption  2.2.1  (Finite  Dimensional  Noise)  There  exists  a  vector  of  random 
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variables,  Y  —  [Yi, ,  Ym ]  :fl->rc  such  that 

e(uj)  =  e(Y(u)) 

where  Yi  :  — >  r,:  C  R.  are  independent  random  variables  with  image  space 

Tj  =  [a*,  bi],  a*  <  bi,  and  Lebesgue  density  pi  :  T*  — >  M  for  i  =  l, ... ,  M.  The  joint 

image  space  of  Y  is  V  =  n^=i  H  and  the  joint  density  is  p  =  n^ii  Pi- 

This  assumption  allows  for  the  change  of  variables 

e(y)(v,z)  =  e(v,z;y)  =  0  Vt/gT.  (2.2.3) 

Furthermore,  this  permits  the  use  of  finite  sampling  and  polynomial  approximation 
schemes  in  solving  (2.2.3)  (c.f.  [9,  10,  11]).  The  Karhunen-Loeve  (KL)  expansion  of 
a  random  field  is  an  infinite  dimension  extension  of  the  singular  value  decomposition 
(SVD)  [64,  71,  103].  Truncating  the  KL  expansion  gives  a  finite  noise  approximation 
of  the  random  field  and  this  truncation  is  often  used  to  satisfy  Assumption  2.2.1.  I 
will  discuss  this  expansion  technique  in  more  detail  in  Section  2.4. 

The  parametrized  state  equation  (2.2.3)  will  be  utilized  when  solving  (2.2.2)  nu¬ 
merically,  but  will  not  be  used  in  the  analysis  of  (2.2.2).  Let  y  be  a  space  of  functions 
on  T  with  values  in  W.  Furthermore,  let  U  be  a  space  of  functions  on  T  with  values 
in  V.  The  state  equation  used  for  analysis  is 

e(u,z)  =  0  (2.2.4) 

where  e  :  U  x  Z  — >  y .  To  make  the  spaces  y  and  U  concrete  and  to  make  the 
definition  of  risk  measures  and  other  quantities  coherent,  I  will  assume  integrability 
of  the  functions  in  U  and  y  with  respect  to  y  G  T.  This  integrability  will  ensure  that 
the  solution  to  the  state  equation  and  elements  in  y  have  certain  statistical  moments. 

Assumption  2.2.2  (Integrability  of  the  State)  The  state  space  for  (2.2.4)  is  the 
Bochner  space 

U  Lpp(T;V)  for  p  e  (1,  oo)  or  p  —  oo 
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and  the  parametrized  state  operator  satisfies 

(u,  z,  y)  ^  e(u{y),  z-y)  :  U  x  Z  x  T  -  y  :=  L*(T;  W)  (2.2.5) 

for  some  s  E  [1,  oo). 

Remark  2.2.3  Equation  2.2.5  in  Assumption  2.2.2  implies 

V  >->  (A (y),e(u(y),z-,y))w*,w  :  T  ->  Lj(r) 

for  all  A  6  y*  :=  Ls*  ( r;  VV)  with  ^  =  1,  u  eW,  and  z  E  Z  or,  equivalently , 

(u,  z,  A)  !-)•  (A,  e(w,  z;  -))>v*,w  :li  x  Z  x  y*  ^  L*(T).  (2.2.6) 

The  state  equation,  (2.2.4),  is  thus  defined  by  the  relationship 
(A,  e(u,  ^))y*,y  =  J  p(y){A(y),  e(u(y),  z;  y))w,w&y  =  E[( A,  e(u,  2;  -))w,w]  (2.2.7) 

for  all  Ae7. 

To  ensure  that  optimization  problem  (2.2.2)  is  well  defined,  for  each  z  E  Z,  there 
must  exist  u(z)  =  u  Eli  such  that  (2.2.4)  is  satisfied.  To  simplify  presentation,  1  will 
use  the  following  notation  to  denote  partial  Frechet  derivatives 

d 

eu(u,z )  :=  —e(u,z)  E  C(U,y) 

and  similarly  for  derivatives  with  respect  to  z  E  Z.  1  will  employ  similar  notation  for 
derivatives  of  the  objective  function  j. 

Assumption  2.2.4  (Existence  of  Solution  Mapping) 

•  For  all  z  E  Z  there  exists  a  unique  u  Eli  such  that  e(u,  z)  =  0; 

•  There  exists  an  open  set  E  C  li  x  Z  with 

S  :=  {(u,  z)  Eli  x  Z  :  e(u,  z)  =  0}  C  E 
such  that  e(u,z )  are  Frechet  differentiable  on  E; 
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•  The  inverse  eu(u,  z)  1  G  C(y,U )  exists  for  all  ( u,z )  G  S. 

Assumptions  2.2.4  ensure  that  the  Implicit  Function  Theorem  (Theorem  1.41  in  [63]) 
holds  for  each  z  G  Z.  In  addition  to  Assumption  2.2.4,  I  will  require  differentiability 
of  the  parametrized  state  operator,  e. 

Assumption  2.2.5  Consider  the  parametrized  state  operator,  e(y)  :  V  x  Z  — >  >V  for 
fixed  y  G  T.  Then  the  mapping 

(u,  z,  y )  I— »•  e(u,  z;  y)  :  U  x  Z  x  T  — >•  y 

is  Frechet  differentiable  with  respect  to  (u,  z)  G  E.  Furthermore,  the  partial  derivative, 
eu,  has  a  bounded  inverse 

eu(u(y),  z-,y)1  G  £(W,  V)  V  (u,  z)  G  E,  a.e.  mT 

and  the  partial  derivative,  ez,  satisfies 

E[( A,  e*(u,  z;  -)fc)w*,w]  =  #[(e*(u,  zj  ■)  A,  <$z)z*,z:] 

=  (EK(U,  ■)X},5z)z*,z-  (2.2.8) 

Assumption  2.2.5  ensures  the  Frechet  derivatives  of  the  state  operator,  e,  can  be 
written  in  terms  of  the  Frechet  derivatives  of  the  parametrized  state  operator,  e, 

(A ,eu{u,z)5u)y*y  =  I  p(y)(X(y),eu(u(y),z-y)du)w*tWdy 

and 

(A,  ez(u,  z)8z)y.y  =  J  p(y){X(y),  ez(u(y),  z;  y)5z)w*}wdy. 

Moreover,  equation  2.2.8  is  a  condition  similar  to  the  conclusion  of  Fubini’s  Theorem 
[49], 
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2.2.2  The  Objective  Function 

Assumption  2.2.4  ensures  the  existence  of  a  unique  solution,  u  G  U,  to  the  state 
equation  (2.2.4)  for  all  z  G  Z.  Furthermore,  this  assumption  guarantees  the  well- 
posedness  of  the  reduced  formulation 

min  J(z)  :=a(j(z;y))  (2.2.9) 

for  appropriate  risk  measures  a.  Under  the  finite  noise  assumption,  the  reduced 
objective  function  is 

y  *-»•  j{z-,  y)  ■=  i{u(y\ z),  z) :  r  -»•  m. 

To  make  concrete  the  notion  of  risk  measure,  I  will  require  that  the  map  y  >  j(z;y) 
is  sufficiently  integrable. 

Assumption  2.2.6  (Integrability  of  the  Objective  Function)  For  u  G  U  = 

Lp(T ;  V),  the  mapping  j/GTh  j(u(y),  z )  G  Lq(T)  for  q  G  [l,p). 

Remark  2.2.7  Assumption  2.2.6  is  equivalent  to 

(u,  z)  ^  j(u,z)  :U  x  Z  -»•  Lqp(T). 

Of  particular  interest  to  this  thesis  are  derivative  based  algorithms.  These  algo¬ 
rithms  require  differentiability  of  j(v,z).  In  addition  to  differentiability  of  j(v,z ),  I 
will  require  differentiability  of  the  risk  measure,  n(Y).  Since  J(z)  =  cr(j(z;y))  is  a 
composite  function,  the  chain  rule  must  hold  for  the  derivatives  of  cr(Y). 

Assumption  2.2.8  (Differentiability  of  the  Objective  Function) 

•  The  deterministic  objective  function,  j  :  V  x  Z  — >  M,  is  Frechet  differentiable 
with  respect  to  v  G  V  and  for  fixed  z  G  Z  the  functional  /3\  :  Vx2xV-»K 
defined  by 


\j{v  +  h,z)  -j(v,z)  —  jv(v,  z)h\  =  fii(v,z,h)\\h\\v 
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satisfies 

pq 

lim  \\Pi(u,z,h)\\Lep)  =  0  where  9  := - >0 

^ ^  p  Q 

for  each  u  <EU  =  Lpp(Y ;  V). 

•  The  deterministic  objective  function,  j,  is  also  Frechet  differentiable  with  respect 
to  z  G  Z  and  for  fixed  v  G  V  the  functional  /32:Vx2xZ->8  defined  by 

| j(v,z  +  h)  -  j(v,z )  —  jz(v,  z)h\  =  f32(v,z,h)\\h\\z 

satisfies,  for  fixed  u  EU, 

lim  \\(32(u,z,h)\\Li{r)  =  0  V  z  G  Z. 

\\h\\z-+0 

•  The  partial  derivative,  jz(u ,  z )  for  ( u,z )  EU  x  Z,  satisfies 

E\jz{u ,  z)Sz]  =  E[jz{u,  z)]8z  V  5z  E  Z. 

Assumptions  2.2.8  and  Proposition  2.1.1  ensure  that  the  deterministic  objective  func¬ 
tion  is  Frechet  differentiable  with  respect  to  u  EU  and  z  G  Z.  Furthermore,  Assump¬ 
tion  2.2.6  implies  that 

ju(u,  z)  e  CilA ,  Lqp(T))  and  jz(u,  z)  G  C{Z,  Lqp{T)). 

The  final  condition  in  Assumption  2.2.8  is  related  to  the  conclusions  of  Fubini’s  The¬ 
orem  [49]. 

Since  for  u  G  U  and  z  G  Z  the  function  y  i— >•  j(u(y),  z)  G  Lqp(T),  our  risk  measure 
must  satisfy 

a  G  L%T). 

Of  course,  since  Jr  p(y)dy  =  1,  any  risk  measure  a  G  Lrp(T)  with  r  >  q  satisfies 
u  G  Lqp(T). 

To  guarantee  the  existence  of  derivatives  of  the  reduced  objective  function,  J{z\ 
I  will  require  the  following  assumption  on  the  risk  measure. 
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Assumption  2.2.9  (Differentiability  of  Risk  Measure)  The  risk  measure, 
a  :  Lqp(T)  — >  M,  is  Hadamard  differentiable. 

Assumption  2.2.9  implies  that  J(z )  is  at  least  Frechet  differentiable  and  a  gradient 
exists  if  Z  is  a  Hilbert  space.  Note  that  Hadamard  differentiability  is  required  since 
Hadamard  differentiability  is  the  weakest  form  of  differentiability  for  which  the  chain 
rule  for  computing  derivatives  of  composite  functions  holds.  For  a  review  of  the  vari¬ 
ous  notions  of  differentiability  in  linear  topological  spaces  see  [6,  7,  107].  Furthermore, 
to  reinforce  notation,  Table  2.1  contains  a  description  of  the  numerous  powers  used 
throughout  this  section  in  the  definition  of  the  Lebesgue  and  Bochner  spaces. 


Power 

Range 

Description 

P 

[1,  ex:)  U  {oo} 

Integrability  of  state  space,  U 

q 

Integrability  of  ( u ,  z)  >  j(u,  z)  for  (u,  z)  G  U  x  Z 

and  domain  of  risk  measure 

s 

[1,  ex)  U  {oo} 

Integrability  of  codomain  of  the  state  operator,  y 

Table  2.1:  Description  of  the  powers  for  the  Lebesgue  and  Bochner  spaces  used  in 
the  assumptions  on  the  objective  function  and  state  equation 

To  ensure  the  existence  of  at  least  one  minimizer  of  (2.2.9),  I  require  that  J(z )  is 
weakly  lower  semi-continuous  and  satisfies  the  infinity  property  [63]. 

Assumption  2.2.10  (Infinity  Property)  For  all  sequences,  { zn }  C  Z  such  that 
1 1 Ai 1 1 z  —■ >  +oo,  the  objective  function  satisfies  J(zn )  — >  +oo. 

In  addition,  a  unique  minimizer  is  guaranteed  when  J(z)  is  uniformly  convex. 

An  extremely  useful  class  of  deterministic  objective  functions,  j  :  V  x  Z  — >  M,  is 
the  class  of  “separable”  objective  functions 


j(v,z)  =jl(v)+j2(z) 
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where  j\  :  V  — >  M  is  convex,  j2  :  Z  — >  M  is  uniformly  convex,  and  j(v,  z)  satisfies  As¬ 
sumptions  2.2.8  and  2.2.6.  This  is  the  case  for  quadratic  control  or  least  squares  type 
problems  with  regularization.  For  these  separable  objective  functions,  the  implicitly 
constrained  optimization  problem  (2.2.9)  can  be  written  as 

mmJ(z)  :=  <r  (ji{u{y,  z))  +  j2(z)y 

The  class  of  separable  objective  functions  warrants  a  few  assumptions  on  the  risk 
measure,  cr(Y).  Note  j2(;z)  does  not  depend  on  y  E  T  and  thus  should  not  contribute 
to  the  risk  associated  with  the  state  u  GW.  Furthermore,  since  j(v,  z)  =  ji(v)  +  j2(z) 
is  convex,  cr(Y)  should  maintain  this  convexity. 

Assumption  2.2.11  (Coherent  Risk  Measure)  Assume  a  :  Lq(T)  — >  M  satisfies 

•  Convexity:  For  all  l'i,  Y2  E  Lgp(T)  and  A  G  [0, 1], 

<r(A Yi  +  (1  -  A )Y2)  <  Acr(Yi)  +  (1  -  A)cr(y2); 

•  Monotonicity:  For  all  Y±,  Y2  E  Lqp(T)  such  that  Y\  <  12  o.e., 

a(Fj)  <  a(y2); 

•  Translation  Equivariance:  For  all  Y  E  Lqp(T)  and  c  E  M, 

cr(Y  +  c)  =  a(Y)  +  c; 

•  Positive  Homogeneity:  For  all  c  >  0  and  Y  E  Lq(T), 

cr(cy)  =  ccr(Y). 

Risk  measures  which  satisfy  Assumptions  2.2.11  are  called  coherent  in  the  sense  of 
Artzner,  Delbaen,  Eber,  and  Heath  [5].  These  assumptions  imply  the  following  form 
for  the  separable  objective  function 

J(z)  =  a(ji(u(y,z))S)  +h(z). 

Furthermore,  J{z)  is  uniformly  convex  whenever  a{j\ (u(y,  z)))  is  convex. 
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Remark  2.2.12  Two  particularly  important  coherent  risk  measures  are  the  mean 
plus  semi- deviation  risk  measure  and  the  conditional  value-at-risk  (CVaR)  risk  mea¬ 
sure.  The  mean  plus  semi- deviation  risk  measure  of  order  q  is  the  risk  measure 
<jr  :  Lqp(Y)  — >  M  defined  by 

ar(Y)  :=  E[Y]  +  cE[[Y  -  E[Y]]q+]1'« 

for  0  <  c  <  1  where  [x]+  =  max{i,  0},  [100].  On  the  other  hand,  the  CVaR  risk 
measure  is  a  :  Lfi  iT)  — »  M  defined  by 

ccvaR(Y )  :=  min  ( t  +  cE[[Y  —£]+]) 

t(zM.  t  ) 

for  c  >  1  [117],  These  two  coherent  risk  measures  fall  into  a  more  general  class  of 
risk  measures  defined  by  the  auxiliary  function  ar  :  R.  x  Lqp(T)  — >  M  defined  by 

ar(t,Y )  :=t  +  cE[[Y  -t]q+]1/q. 

That  is,  the  mean  plus  semi-deviation  and  CVaR  risk  measures  can  be  expressed  as 

ar(Y)  =  ar(E[Y],Y)  and  aCvaii(Y )  =  min  afit,Y), 

respectively.  Furthermore,  these  two  coherent  risk  measures  are  Hadamard  differen¬ 
tiable,  i.e.  they  satisfy  both  Assumption  2.2.8  and  Assumption  2.2.11. 

2.3  The  Adjoint  Calculus 

I  will  now  derive  an  adjoint  calculus  for  computing  the  derivative  of  J(z)  (c.f.  see 
Chapter  1.6  of  [63]  for  a  detailed  discussion  of  adjoints).  To  clarify  notation,  1  will 
denote  the  derivative  of  J(z )  as  J'{z)  G  Z* .  For  the  risk  measure  a,  the  derivative 
or'(Y)  can  be  associated  with  Vcr(Y')  G  Lq*  (T)  where  1  =  -  +  4  and  a'(Y')s  = 
E[\/a(Y)s]  for  all  s  G  Lq(T). 

To  derive  the  adjoint  calculus,  consider  the  Lagrangian  functional 

L:U  xZ  xy* 
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defined  by 

L(u,  z,  A)  :=  cr(j(u,  z))  +  (A,  e(u,  z))y*y. 

Note  that  the  Lagrangian  is  at  least  Hadamard  differentiable  with  respect  to  u  G  U 
and  z  G  Z  by  Assumptions  2.2.8,  2.2.9,  and  2.2.4.  Furthermore,  L  is  Frechet  differ¬ 
entiable  with  respect  to  the  Lagrange  multiplier  A  G  y*  by  linearity.  The  Lagrangian 
functional  is  often  used  as  a  local  lower  support  function  for  proving  optimality  con¬ 
ditions  and  satisfies  the  condition 

(■ J>{z),Sz)z*,z  =  {Lz(u,z,  \),5z)z*,z 

whenever  u  EU  and  A  G  y*  satisfy 

Lu(u,z,  A)  =  0  and  L\(u,  z,  A)  =  0. 

Differentiating  L  with  respect  to  the  Lagrange  multiplier  A  G  y*  and  setting  the 
derivative  equal  to  zero  returns  the  state  equation  (2.2.4).  On  the  other  hand,  setting 
the  derivative  of  L  with  respect  to  u  €  U  to  zero  yields  the  adjoint  equation 

0  =  Lu(u,  z,  A )Su  =  E\y<r(j(u,  z))ju(u,  z)5u\  +  (A,  eu(u,  z)6u)y*>y 
=  E\y<j(j(u,  z))ju(u,  z)8u\  +  (eu(u,z)*\,5u)u*,u 

for  all  5u  G  U  where  ju(u,  z)  G  C(U,  Lqp(T))  and  eu(u,z)*  G  C(y* ,14*)  denotes  the 
adjoint  of  eu(u,z).  Now,  unraveling  the  duality  pairing,  (•,  •  )u*:u,  and  employing 
Assumption  2.2.5,  the  adjoint  equation  can  be  written  as 

0  =  E[(Va(j(u,  z))ju(u,  z)  +  eu(u,  z;  -)*A,  5u)v*y]  VSueU.  (2.3.1) 

As  in  the  test  case,  it  may  be  beneficial  to  consider  the  parametrized  adjoint 
equation 

eu{u(y),z-,y)*\(y)  +  Va(j(u,z))ju(u(y),z)  =  0  a.e.  gL  (2.3.2) 

Assumption  2.2.5  ensures  that  eu(u,z;y)  has  a  bounded  inverse  almost  every  where 
in  T:  therefore,  (2.3.2)  has  a  unique  solution,  A (y)  G  W*  for  almost  all  y  G  T.  By 
Assumption  2.2.4,  the  solutions  to  (2.3.2)  and  (2.3.1)  coincide. 


37 


Finally,  differentiating  the  Lagrangian  with  respect  to  z  G  Z  and  applying  As¬ 
sumptions  2.2.8  and  2.2.5  yields 

Lz(u,  z,  A )5z  =  E[Wa(j(u,  z))jz(u,  z)8z]  +  (A,  ez(u,  z)8z)y*:y 
=  (E[Va(j(u,  z))jg(u,  z)  +  e*(u,  z\  -)A ],8z)z*,z 

for  all  8z  G  Z  where  jz(u,z )  G  C(Z,  Lqp(T))  and  e*z(u,z)  G  C(y*,Z*)  denotes  the 
adjoint  of  ez(u,z).  Therefore,  for  fixed  z  G  Z,  if  u(z)  =  u  G  U  solves  (2.2.4)  and 
A(^)  =  A  G  y*  solves  (2.3.2),  then  the  derivative  of  J  is 

.? (z)  =  E\Va(j(u,  z))jg(u ,  z)  +  e*z(u,  z;  -)A].  (2.3.3) 

Remark  2.3.1  If  cr  is  a  coherent  risk  measure  (i.e.  satisfies  Assumptions  2.2.11) 
and  j  is  a  separable  objective  function  with 


j(v,z)  =  ji(v)  +  j2(z), 


then  (2.3.3)  can  be  simplified  to 


J'(z)  =  E  e*z(u,z;-) A  +  j'2(z) 

where  u  =  u(z)  G  U  solves  (2.2.4)  and  A  =  A(z)  G  W*  solves 


(2.3.4) 


e*u(u(y),  y)^  +  v<r(ji(u))ji(u(y))  =  °- 


(2.3.5) 


Now,  returning  to  the  test  problem  (2.1.1),  the  objective  function  j  is  separable  so 
Remark  2.3.1  applies.  Recall  the  objective  function  is  given  as  j(v,  z)  =  ji(v)  +  j2(z) 
where 

3i(v)  =  ^||Qv  -  q\\2n  and  j2(z)  =  |||^|||. 

The  adjoint  equation  (2.3.5)  for  this  specific  objective  function  is 

My)*x(y)  +  Vo-(||Qm  -  «||w)Q*(Q^(s/)  -  g)  =  o 

where  Q*  G  C(TL* .  V*)  denotes  the  adjoint  operator  associated  with  Q  and  u  =  u(z)  G 
U  solves  the  state  equation  for  a  given  z  G  Z.  On  the  other  hand,  the  derivative  of 
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J(z)  is  handled  in  a  similar  fashion  to  that  of  (2.3.4).  First  notice  that  the  derivative 
of  the  deterministic  objective  function,  j(v,  z ),  with  respect  to  z  G  Z  in  the  direction 
s  G  Z  satisfies 

( jz(v,z),s)z*,z  =  (o(Z,  s)z  =  (aR z,s)z*,z 

where  R  G  C(Z,Z*)  is  the  unique  operator  satisfying  (R z,s)z*,z  =  {z,s)z-  This 
gives  rise  to  the  following  expression  for  the  derivative  (2.3.4) 

J'{z)  =  «r^  +  f;[b*a] 

where  A  £  7  solves  the  adjoint  equation.  Note  that  since  Z  is  assumed  to  be 
a  Hilbert  space  and  J(z)  is  at  least  Hadamard  differentiable  by  Assumption  2.2.8, 
Riesz  Representation  Theorem  (e.g.  see  Theorem  1.4  in  [63])  ensures  the  existence 
of  a  representer  for  J'(z )  in  Z  (i.e.  the  gradient).  The  above  calculations  give  the 
following  expression  for  the  gradient  of  J (z) 

X7J(z)  =  az  +  w 

where  w  =  w(z)  G  Z  solves 

R w  =  E  [v<7(||Qu  -  g||w)B*A] , 

u  =  u(z)  G  U  solves  the  state  equation,  and  A  =  A(z)  G  y*  solves  the  adjoint 
equation. 

2.4  The  Karhunen-Loeve  Expansion 

In  this  section,  I  will  discuss  a  common  technique  for  satisfying  the  finite  noise  as¬ 
sumption,  Assumption  2.2.1.  This  technique  is  known  as  the  Karhunen-Loeve  (KL) 
expansion  [64,  71].  The  KL  expansion  gives  an  infinite  series  representation  of  a  given 
random  field.  Often,  this  series  representation  is  truncated  and  the  original  random 
field  is  replaced  by  a  partial  sum. 
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Now,  let  e  :  ff  x  D  — >  M  be  a  random  field  on  the  probability  space  (h2,  T ,  P)  with 
finite  second  order  moments,  e  G  Lp(ff;  L2 (D)) .  Associated  with  e  is  the  covariance 
function 

Ce(x,  x)  ■=  E  (e(-,  x)  -  E[e(-,  x)])(e(-,  x)  ~  E[e(-,  *)]) 

where  E  :  Lp(T)  — >  M  denotes  the  expected  value  operator  E[X\  =  JQX(oj)dP(oj). 
The  covariance  function,  Ce(x,  x),  describes  the  spatial  correlation  of  the  random  field 
e.  Furthermore,  the  covariance  function  induces  a  linear  operator,  Tc  e  £(L2(D),  E2(D)), 
defined  by 

T e<f>(x)=  [  Ce(x,x)<i>(x) dx- 

J  D 

Tc  is  a  compact,  positive,  and  self  adjoint  operator.  Furthermore,  Mercer’s  Theorem 
ensures  the  existence  of  an  orthonormal  basis  of  eigenfunctions  C  L2(D)  and 

eigenvalues  {Afc}^=1  C  (0,  oo)  such  that 

OO 

Ce(x,  X)  =  X^k(x)ek(x) 

k=  1 

(see  Theorem  11  in  Chapter  30,  Section  5  of  [69]).  These  eigenfunctions  and  eigen¬ 
values  induce  the  following  decomposition  of  the  random  field  e, 

OO 

e(u,  x)  =  E[e(-,  x)]  +  ^  ^/Ykek{x)Yk{u)  (2.4.1) 

fc=i 

where  Yk( u)  G  E2P(tt)  satisfy 

E[Yk]  =  0  and  E[YjYk]  =  Sjk  V  j,  k  =  1,  2, . . . 

The  expansion  (2.4.1)  is  known  as  the  KL  expansion  of  the  random  field  e.  The 
convergence  rate  of  the  partial  sums  of  the  expansion  (2.4.1)  is  completely  dependent 
on  the  decay  of  the  eigenvalues,  Xk,  which  depend  on  the  covariance  function  (see 
[106]).  Such  decay  induces  anisotropy  in  the  random  field  e.  Anisotropy  here  means 
that  some  directions,  Tk,  have  a  larger  effect  on  the  random  field  e  than  others.  As 
mentioned  above,  a  common  practice  is  to  replace  the  random  field  e  with  a  truncated 
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KL  expansion,  i.e. 

M 

e(cu,  x )  <-  eM(cu,  x )  :=  E[e(-,  x)]  +  ^  yj Ykek(x)Yk(u ;). 

fc=i 


2.5  Tensor  Product  Function  Spaces 

The  general  formulation  dictates  the  use  of  the  Bochner  space,  L®(T;V),  as  the 
stochastic  state  spaces.  Furthermore,  in  constructing  a  method  for  the  solution  of 
e(u(y),z;y )  =  0  for  all  y  G  Y  and  fixed  z  G  Z,  the  stochastic  collocation  method 
seeks  to  interpolate  u  =  u(y)  G  V  on  a  finite  set  of  knots  in  the  parameter  space, 
T.  Therefore,  an  application  of  stochastic  collocation  will  require  a  finite  number  of 
solves  of  the  state  equation  e(u(y),  z\  y)  =  0.  As  is  common  in  approximation  theory, 
convergence  of  this  interpolant  is  highly  dependent  on  the  regularity  of  u  —  u(y )  with 
respect  to  the  parameter,  y  e  T.  Assumption  2.2.2  guarantees  that  the  solution  u  is 
a  member  of  the  Bochner  space 

u  eU  :=  Lqp(T ;  V)  for  some  1  <  q  <  oo. 

Moreover,  Assumption  2.2.4  implies  u  =  u(y)  exists  for  every  y  e  T.  Hence,  it  is 
often  realistic  to  assume 

«eCp°(T;V). 

In  the  case  of  linear  elliptic  PDEs  with  uncertain  inputs,  certain  regularity  of  these 
coefficients  on  the  parameters,  y  G  T,  ensures  u  G  C™(T]V).  In  fact,  Assump¬ 
tion  3.2.1  implies  this  fact  for  general  constraints.  In  this  section,  I  will  develop 
theory  for  three  classes  of  tensor  product  spaces.  First,  I  will  discuss  the  construction 
of  Bochner  spaces.  Secondly,  I  will  present  a  few  results  concerning  C°(r;V),  and 
finally,  I  will  provide  some  results  for  C'^(r). 

The  goal  of  this  Bochner  space  discussion  is  to  relate  L^(r;V)  with  the  spaces 
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Lq(T)  and  V.  First,  define  the  tensor  product  space 

f  N 

Lqp{T)  0  V  :=  fnVn  :  {fn}  c  Lqp(T),  {vn}  C  V,  and  TV  G  N 

t  n=  1 

Such  sums  are  well  defined  since  functions  in  Lq(T)  output  into  M  and  V  is  a  real 
Banach  space.  It  is  clear  from  the  definition  of  Lq(T)  0  V  that 

rj(r)®vcL2(r;v), 

i.e.  for  v  =  Yln=i  fnvn  G  Lqp{Y)  0  V,  it  is  easy  to  see  that 

N 

IMIlJKPV)  —  ll^n II V II  fn  || Lp(r)  <  00 • 

n=l 

With  the  appropriate  choice  of  norm  on  Lq(T)  <g)  V  (c.f.  the  projective  norm,  see 
Chapter  2.3  in  [102]),  one  can  show  that  the  completion  of  Lq(T)  <g)  V  is  isometrically 
isomorphic  to  Lq(T ;  V).  This  choice  of  norm  is  associated  with  the  natural  norm  on 
the  Bochner  space  via  the  relationship  for  /  G  Lq(T)  and  v  G  V 

\\f®v\\l  :=  J  p(y)\\f(y)v\\1dy. 

Here,  the  tensor  product,  <g)  is  defined  in  the  standard  way,  i.e.  (/  <g)  v)  :  t/  w  f(y)v. 
Note  that  this  product  is  well  defined  since  V  is  closed  under  scalar  multiplication. 
Hence,  it  is  sufficient  to  approximate  the  elements  u  G  L^(r;V)  by  elements  in  the 
tensor  product  space,  u  G  Lq(T)  <g)  V. 

In  the  same  manner,  one  can  relate  C°(r;  V)  to  the  spaces  Op(r)  and  V.  Define 
the  tensor  product  space 

f  N 

C°p(T)  0  V  :=  fnVn  ■  {/n}  c  CP(rh  M  c  V,  and  TV  G  N 

t  n=l 

Again,  such  sums  are  well  defined  since  functions  in  C[p(Y)  real  valued  and  V  is  a  real 
Banach  space.  Furthermore,  it  is  clear  that 


cj(r)®vccj(r;v) 
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by  similar  arguments  as  used  in  the  Bochner  space  discussion.  Now,  with  the  appro¬ 
priate  choice  of  norm  on  Cp(T)  8  V  (c.f.  the  injective  norm,  see  Chapter  3.2  in  [102]), 
one  can  show  that  the  completion  of  (7p(r)  (8)  V  is  isometrically  isomorphic  to  the 
Banach  space,  (7p(r;  V).  This  choice  of  norm  is  associated  with  the  natural  norm  on 
the  space  Cp(T;  V)via  the  relationship  for  /  G  Cp(T)  and  v  6  V 

\\f®v\\£  :=  snp  \\p(y)f(y)v\\v- 
ye  r 

The  tensor  product  f  ®v  is  defined  as  in  the  Bochner  space  discussion.  Therefore,  it 
is  again  sufficient  to  approximate  a  function  u  G  Cp(T;  V)  by  a  function  in  the  tensor 
space,  u  G  Cp(T)  8)  V. 

Since  Cp(T;  V)  can  be  associated  with  the  completion  of  Cp(T)  8  V,  I  will  focus  on 
approximation  in  (7p(r)  8  V.  In  order  to  do  this,  I  would  like  to  associate  C^(T)  with 
the  one  dimensional  function  spaces  C°pk(Yk)  for  k  —  1, . . . ,  M.  In  this  case,  I  can 
build  approximation  operators  for  C*°(r)  based  on  one  dimensional  operators  acting 
on  C°fc(rfc).  As  before,  define  the  tensor  product  space 

{N  M 

V  n  h,n  ■■  {/»,„}  c  C°n(rk)  and  N  £  N 

n=  1  k= 1 

Clearly,  this  definition  implies 

cp°i(r1)8---8Cp°M(rM)ccp°(r) 

and  using  the  appropriate  norm  (again,  the  injective  norm  [102]),  one  can  show  that 
the  completion  of  C'pi(Ti)  8  •  •  •  8Cpju(T m)  is  isometrically  isomorphic  to  Cp(T).  The 
norm  in  this  case  is  defined  by  the  relationship  for  fk  G  Cpfc(TA,),  k  —  1, . . . ,  M, 

ll/i  8  •  •  •  8  }m lie  :=  sup  \pi(yi)fi(yi)  ■  ■■■■  Pm(?/m)/m(z/m)|- 
y  er 

Here,  the  notation  f1  8  •  •  •  8  /m  is  associated  with  the  mapping 


(/i  8  •  •  •  8  /m)  :  y  ^  fi(yi)  •  •  •  •  • 
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With  this  said,  the  numerical  methods  in  this  thesis  will  attempt  to  approximate 
functions  in  C'^(T)  g  V.  Furthermore,  to  approximate  the  parametric  dependence  in 
C°(r),  it  will  suffice  to  only  work  in  C'^  (T) )  g  •  •  •  g  Cf}M( r m)- 


Chapter  3 


The  Stochastic  Collocation  Method 


Stochastic  collocation  is  a  non-intrusive  method  for  solving  the  high  dimensional 
parametric  equation  (2.2.3), 


e(u(y),  z\  y)  —  0  a.e.  in  Y 


for  fixed  z  E  Z.  Stochastic  collocation  is  an  interpolation  based  method  and  relies 
on  the  regularity  of  the  solution  u  G  U  to  (2.2.3)  with  respect  to  y  G  Y  in  order  to 
achieve  rapid  convergence  rates.  In  this  chapter  I  will  review  the  stochastic  collocation 
discretization  technique  for  the  solution  of  the  parametric  equations.  I  will  then 
extend  these  techniques  to  the  case  of  the  optimization  problem  (2.2.9) 

min  J(z)  :=  a(j(u(y- z),  z)) 

where  u(z)  =  u  EU  =  LP(Y]  V)  is  the  solution  to  (2.2.3).  Furthermore,  I  will  formu¬ 
late  the  stochastic  collocation  method  using  the  abstract  approximation  operator, 


cq  :  cj(r)  -»•  cq(y)  c  c'J(r). 


The  set  Cq(T)  is  a  finite  dimensional  subspace  of  C°(r)  and  the  operator,  Cq,  only 
requires  a  finite  number  of  function  evaluations  at  the  points  in  the  set,  Mq  C  Y  with 
\Mq\  =  Q.  Associated  with  Cq  is  the  quadrature  operator,  Eq  :=  EoCq  :  Clp(Y)  — >  M. 
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The  operators,  Cq  and  Eq,  are  typically  sparse  grid  or  tensor  product  operators.  The 
notion  of  sparse  grids  is  developed  in  Chapter  4.  Concluding  this  chapter,  I  will  prove 
error  bounds  for  stochastic  collocation  applied  to  optimization  problems  governed  by 
parametric  equations. 

3.1  Collocation  for  Parametric  Equations 

Consider  the  parametric  equation 

e(u(y),  z;  y)  =  0  a.e.  in  T  (3.1.1) 

where  u  G  C'°(T;V)  and  z  6  2.  This  equation  typically  represents  a  parametrized 
PDE  or  PDE  with  uncertain  coefficients.  As  in  Chapter  2,  e(y)  :  V  x  Z  — >  W  for 
y  G  T.  The  stochastic  collocation  method  builds  an  approximate  solution  to  (3.1.1) 
on  a  finite  set  of  “collocation  points.”  The  approximation  operator,  Cq ,  eluded  to 
above  is  a  linear  operator  which  depends  on  a  finite  number  of  function  evaluations 
at  points  in  Mq  =  {yi, . . . ,  2/q}  C  T  with  \Mq\  =  Q.  Furthermore,  1  will  assume  that 
Cq  has  the  form 

Q 

(CQf)(y)  =  ]T pt(v)f( yk)  v  /  6  cj(r) 

k= 1 

where  Pk  G  Cq(T)  for  k  =  1, . . . ,  Q.  In  Chapter  4,  I  will  construct  specific  operators, 
Cq,  as  tensor  product  or  sparse  grid  approximation  operators.  In  the  case  that  Cq  is 
a  tensor  product  operator  or  a  sparse  grid  operator  built  on  nested  one  dimensional 
interpolation  knots,  Cq  is  interpolator^  i.e.  for  any  /  G  (7°(r) 

(£Qf)(y)  =  f(y)  Vi/gATq, 

or  equivalently,  Pk(yj)  =  &kj  f°r  k,  j  =  1  On  the  other  hand,  if  Cq  is  a 

sparse  grid  operator  built  on  non-nested  or  weakly  nested  one  dimensional  interpo¬ 
lation  knots,  then  Cq  is  not  interpolatory.  These  interpolation  results  are  presented 
Chapter  4,  Proposition  4.2.5.  For  the  purposes  of  this  thesis,  the  finite  dimensional 
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approximation  space,  Cq(T),  is  assumed  to  be  a  polynomial  space  and  in  the  context 
of  tensor  product  and  sparse  grid  operators,  Cq(T)  can  be  explicitly  determined  (see 
Proposition  4.2.4  in  Chapter  4). 

With  this  definition  of  Cq,  the  stochastic  collocation  solution  to  (3.1.1)  is 

Q 

uq(v)  =  {£q)u(v)  =  Y pk(y)u(yk )• 

k= 1 

To  compute  uq,  one  needs  to  solve  (3.1.1)  for  all  y  G  Mq,  where  Mq  denotes  the  Q 
interpolation  knots  associated  with  Cq.  Thus,  one  must  solve 

e(u(y),z-,y)  =  0,  yefifQ 


in  order  to  compute  Uq. 

3.2  Regularity  and  Interpolation 

The  stochastic  collocation  method  depends  on  the  point- wise  solution  of  (3.1.1);  there¬ 
fore,  sufficient  regularity  of  the  solution  to  (2.2.4),  u  eh,  with  respect  to  the  parame¬ 
ters,  y  G  T,  is  essential.  First  of  all,  u  G  WflC°(T;  V)  in  order  for  point  evaluations  of 
u  to  be  well  defined.  Moreover,  one  must  ensure  that  the  error  committed  through  the 
approximation  operator,  Cq,  decays  as  the  number  of  collocation  points  increases.  In 
the  context  of  polynomial  approximation  and  interpolation,  the  following  assumption 
on  u  G  U  is  sufficient  (c.f.  see  Chapter  7,  Section  8  in  [45]). 

Assumption  3.2.1  (Analyticity  of  the  State)  For  fixed  y*  G  T 3  with  j  k, 
define 

u*k(yk)  ■=  u(y*n yl_i,yk, y*k+ 1, 

The  function,  u*k  :  Tk  — >  V,  has  an  analytic  extension  on  the  open  elliptic  disc, 
Drk  C  C  containing  Tfc.  Specifically,  ifTk  =  [—1,1],  then  Drk(Tk )  is  the  region 
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bounded  by  the  ellipse 

(  Th  +  rT1  Th  —  rr1 

£rk  :=  <  z  =  t  +  is  £  C  :  t  —  - - — —  cos0,  s  =  - - — —  sin  <j),  (f  G  [0,  27r) 

I  Z  Z 

(3.2.1) 

Ellipses  in  the  complex  plane  are  natural  choices  when  analyzing  interpolation  and 
quadrature  errors  because  the  space  L2(Drk)  is  a  reproducing  kernel  Hilbert  space, 
therefore  point  evaluation  is  a  continuous  linear  functional.  For  more  information  on 
reproducing  kernel  Hilbert  spaces  and  quadrature  error  analysis  see  [112].  There  are 
possibly  many  conditions  on  e(u,z;y)  for  which  Assumption  3.2.1  holds.  1  will  now 
present  one  such  set  of  assumptions.  The  assumptions  made  here  are  extensions  of 
Assumption  1  in  [68].  Assumption  1  in  [68]  are  specific  for  the  case  where  e(u,z;y) 
is  linear  in  u  and  z.  Furthermore,  similar  assumptions  were  made  for  the  case  of 
truncated  KL  expansions  in  Lemma  3.2  of  [9].  The  proof  of  Theorem  3.2.2  below 
follows  similar  arguments  as  the  proof  of  Lemma  3.2  in  [9]. 

Theorem  3.2.2  Suppose  the  Implicit  Function  Theorem  holds  for  (3.1.1)  and  sup¬ 
pose  T  is  bounded.  Let  z  G  Z  be  fixed,  u(z)  =  u  G  CQp(T]  V)  solve  (3.1.1),  and  define 
for  fixed  y*  eTjt  j  ^  k, 

ek(u,  z\  yk)  :=  e(u*k(yk),  z-y{, . . . ,  y*k_v  yk,  y*k+v  . . . ,  y*M). 

If  for  each  k  —  1, . . . ,  M,  there  exists  b  <  +oo  and  C  =  C(z)  >  sup,yfcgrfc  \\u*k{yk',  z)  ||v 
such  that 

sup  \\(duek(u(z),z-,yk))~1(dykek(u(z),z;yk))\\  <  Cbn  Vn  G  N,  (3.2.2a) 

Vk&Vk 

sup  \\(duek(u(z),z-,yk))~l(dlfek(u(z),z-,yk))\\  <n\bn  Vn  G  N, 

yfcerfe 

then  u  —  u(z)  satisfies  Assumptions  3.2.1. 


(3.2.2b) 
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Proof:  By  implicitly  differentiating  the  equation  e(u,z\y)  =  0  with  respect  to  ijk 
and  applying  the  triangle  inequality, 

dnyul{yk)  ^  <  Yduek(u{z),z]yk)Y1{d^kek(u{z),z]yk)) 

+  5Z(f)  (duek(u(z),z;yk))~1(d%(u(z),z-yk))  d™~ku*k(yk) 
k=  1  '  ' 

Define  Rn  :=  ^ ||^yfcwfc(2/fc)|lv’  then  R0  <  C  and  by  the  bounds  (3.2.2) 

n 

Rn  <Cbn  +  Y,  bm Rn-m  <  C(2b)n. 

m=  1 

Thus,  for  all  7  G  C  such  that  1 7  —  y^  <  r  < 

00  c 

7)  =  Rn ^  ~  Vk^n  and  llufc(7)llv  <  1  _  2br  <  +oo-  (3-2.3) 

n= 0 

The  relations  (3.2.3)  hold  for  all  yk  G  T*,;  therefore,  u*k  has  an  analytic  extension  on 
the  region 

Efc(r)  :=  {7  6  C  :  (7  -  yk\  <  r  Vyfc  G  Tfc} 

i.e.  the  union  of  all  balls  of  radius  r  with  center  yk  for  all  yk  G  Tk.  This  implies  that 
u*k  has  an  analytic  extension  on  any  ellipse  contained  in  the  set  T,k(r).  □ 

Remark  3.2.3  A  similar  result  holds  in  the  case  that  T  is  unbounded.  To  arrive  at 
this  result,  one  requires  an  auxiliary  distribution  which  decays  sufficiently  fast.  For 
more  information  on  unbounded  random  variables  and  auxiliary  distributions,  see  [9]. 

Throughout  this  chapter,  I  will  use  an  assumption  on  the  error  associated  with 
the  interpolation  operator,  Cq.  I  will  use  a  general  error  bound  and  track  this  error 
through  the  optimization  problem  to  the  optimal  controls. 

Assumption  3.2.4  Suppose  f  :  T  — >  M  has  an  analytic  extension,  then 


||/-^g/||L~(r)  <CQ~V 
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where  Q  denotes  the  number  of  interpolation  knots  associated  with  Cq,  C  =  C(f)  >  0, 
and  v  =  u(f)  >  0. 

A  common  form  of  the  quantity  C  —  C(f)  is  C  —  C supy€V  \f(y)\  where  V  C  C. 
Moreover,  Assumption  3.2.4  and  the  tensor  product  structure  of  L“(T;  V)  (see  Sec¬ 
tion  2.5)  imply  that  the  error  committed  by  the  stochastic  collocation  method  is 

\\u  — Cqu\\l™(t-v)  <C{u(z))Q  v.  (3.2.4) 

This  error  bound  is  proved  in  Theorems  3.8  and  3.13  of  [85]  for  anisotropic  Smolyak 
sparse  grids  built  on  Clenshaw-Curtis  and  Gaussian  knots  respectively.  Similar  error 
bounds  are  proved  in  Theorems  4.1  and  6.2  of  [9]  for  tensor  product  and  isotropic 
Smolyak  sparse  grids  built  on  Gaussian  knots.  The  error  bounds  presented  in  The¬ 
orems  4.1  and  6.2  of  [9]  and  Theorem  3.13  of  [85]  are  bounds  on  the  L2p(T ;  V)  error. 
These  error  bounds  are  derived  by  first  determining  the  bound  (3.2.4).  Since  p  is  a 
probability  distribution,  the  L^(T;  V)  error  can  then  be  bounded  above  by  (3.2.4).  In 
Chapter  4,  I  will  prove  the  error  bound  (3.2.4)  for  specific  operators,  Cq. 

3.3  Collocation  for  Optimization 

In  this  section,  I  will  present  two  possible  stochastic  collocation  discretizations  of  the 
optimization  problem 

min  J{z)  :=  a(j(u(y,z),z))  (3.3.1) 

zCLZ, 

where  u(y,z)  =  u(y )  €  V  for  y  G  T  solves  (3.1.1).  The  first  approach  is  to  ap¬ 
proximate  the  solution  of  (3.1.1)  using  stochastic  collocation  and  plug  the  discretized 
solution  into  the  objective  function.  This  is  a  common  technique  in  PDE  constrained 
optimization  and  is  a  “discretize  then  optimize”  approach.  The  second  approach  is 
to  approximate  the  map, 

y  e  r  *-+j(u(y),z), 
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using  the  approximation  operator,  Cq.  This  also  is  a  “discretize  then  optimize” 
approach,  but  is  equivalent  to  an  “optimize  then  discretize”  approach. 

To  discretize  (3.3.1)  using  the  first  method,  one  approximates  the  solution  to 
(3.1.1)  using  stochastic  collocation  and  the  approximation  operator,  Cq,  i.e. 

Q 

uq{v\  z)  =  (£ Qu(z))(y )  =  ^2  Pk{yMyk,  z) 

k= 1 

where  u(yk)  '■=  u(yk ;  z)  e  V  for  k  —  1, . . . ,  Q  solves 


e(u(yk),z;yk)  =  0  for  k  =  1, . . . ,  Q. 


(3.3.2) 


Then,  plugging  the  approximate  solution,  UQ(y ;  z),  into  the  objective  function  results 
in  the  following  discretization  of  (3.3.1) 


mm 


Jq(z)  :=  a{j{uQ{y\ z),z))  =  a  j  (  ^  Pk(y)u(yk ;  z),  z 


Q 


(3.3.3) 


k=\ 


The  adjoint  calculus  of  Section  2.3  is  easily  extended  to  (3.3.3)  yielding  the  derivative 


Q 


j'q  w  =  Es  z(u(yk]  z),  z]  yk)*Xk  +  E  \/a(j(uQ(y;  z),  z))jz(uQ(y;  z),  z) 


k=  1 


where  \k  =  A k(z)  G  W*  solves  the  adjoint  equation 


eu{u(ykz ),  z\  yk)*Xk  +  E  Pk(y)\7a(j(uQ(y;  z),  z))ju(uQ(y;  z),  z) 


=  0 


(3.3.4) 


for  all  k  —  1, _ ,  Q.  Recalling  the  infinite  dimensional  parametrized  adjoint  equation, 

(2.3.2),  from  Section  2.3, 


eu(u,z;y)*A  +  Va(j(u,z))ju(u,z)  =  0  for  a.e.  y  e  T,  (3.3.5) 


it  is  not  clear  that  (3.3.4)  corresponds  to  a  discretization  of  (3.3.5),  especially  when 
sparse  grid  operators,  Cq,  are  used.  Furthermore,  notice  that  the  adjoint  equation 
(3.3.4)  requires  the  explicit  knowledge  of  the  polynomials  Pk  for  k  —  1, . . . ,  Q.  For  the 
class  of  approximation  operators,  Cq,  considered  in  this  thesis,  the  polynomials,  Pk 
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for  k  —  1, . . . ,  Q,  are  challenging  to  compute  (see  Chapter  4  for  a  detailed  description 
of  the  operator  Cq  used  in  my  computations).  Therefore,  the  discretized  optimization 
problem,  (3.3.3),  will  not  be  used. 

Instead  of  approximating  the  solution  of  (3.1.1)  as  uq  =  Cqu  and  substituting  uq 
into  the  objective  function,  I  will  approximate  the  map 

y  e  r  *-+j(u(y),z) 


for  any  u  G  U  using  Cq.  This  again  is  a  “discretize  then  optimize”  approach  where 
instead  of  discretizing  the  constraint,  I  have  discretized  the  objective  function.  To 
do  this,  I  will  replace  a  with  oq  :=  a  o  Cq.  This  substitution  yields  the  optimization 
problem 


Q 


min  JQ(z)  :=  crQ(j(u(y;  z),  z))  =  a  l  Pk(y)j(u(yk]  z),  z) 

\  k= 1 

where  u(yk',  z)  =  u(yk)  G  V  solves 

e(u(yk),z;yk)  =  0  V  k  =  1, . . . ,  Q. 
Furthermore,  the  derivative  of  Jq(z )  is 

Q  f 

Jq(z^>  =  '^k  1  ^(u(yc,  z),z;  VkY^k  +  jz(u(yk;  z),  z) 


k=  1 


(3.3.6) 


where  Cl  =  E 


V(J(  Y^=ipe(y)j(u(ye',z),z))Pk(y)  and  A k(z)  =  Xk  G  >V*  solves 


eu(u(yk)  z),  z;  yk)*\k  +  ju(u(yk ;  z),z)  =  0  V  k  =  1, . . . ,  Q. 


Note  that  if  cr  =  E,  then  Ck  are  the  quadrature  weights  associated  with  the  ten¬ 
sor  product  quadrature  rule,  Eg,  i.e.  tok  :=  Ck  =  E[Pk],  In  this  case,  the  notions 
of  “optimize  then  discretize”  and  “discretize  then  optimize”  coincide.  Using  this 
discretization  scheme,  the  discretized  state  and  adjoint  equations  correspond  to  the 
stochastic  collocation  discretization  of  the  true  state  and  adjoint  equations,  (2.2.4) 
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and  (2.3.2)  respectively.  Although  this  discretization  scheme  is  consistent,  there  is 
one  apparent  pitfall  of  using  this  general  tensor  product  discretization.  The  pitfall 
is  that  the  cubature  weights  associated  with  Eq  may  not  all  be  positive.  Thus,  it  is 
possible  that  Jq(z )  is  not  convex  even  if  J(z)  is. 

3.4  Collocation  Error  Bounds  for  Optimization 

In  discussing  the  discretization  of  the  optimization  problem  (3.3.1),  it  is  crucial  to 
understand  error  between  a  given  discretization  and  the  true  (infinite  dimensional) 
solution.  In  this  section,  1  will  prove  an  error  bound  for  the  discretized  problem, 
(3.3.6)  corresponding  to  the  test  problem  presented  in  Section  2.1.  1  will  first  prove 
the  error  bound  for  the  risk  measure,  <j(Y)  =  if[Y'],  then  I  will  generalize  this  result 
to  other  common  risk  measures. 

3.4.1  Minimizing  the  Expected  Value 

The  expected  value  risk  measure  results  in  the  linear  quadratic  optimal  control  prob¬ 
lem 

min  J(z)  :=  ^e\\\Qu(z)  -  q\\2n 1  +  ||H||  (3.4.1) 

zez  z  l  J  z 

where  u(y)  =  u(y,z)  G  V  for  all  y  G  T  solves 

A(y)u(y)  +  B(y)z  +  b(y)  =  0  Vt/Gh 

Recall  here  that  Z  and  7i  are  Hilbert  spaces,  and  V  and  IV  are  Banach  spaces. 
Furthermore,  Q  G  £(V,7i)  is  an  observation  operator,  q  G  7i  is  the  desired  state, 
A (y)  G  £(V,  W)  for  all  y  G  T  is  the  state  operator,  B(y)  G  C(Z,  W)  for  all  y  G  T 
is  the  control  operator,  and  b (y)  E  W  for  all  y  G  T  is  an  inhomogeneity.  Moreover, 
a  =  E  :  LpiT)  — >  M.  Therefore,  the  state  space  is  U  =  L2p{T\V)  and  the  Lagrange 
multiplier  space  is  y*  =  L2p{Y\  V)  =  U. 
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The  error  analysis  presented  here  is  centered  around  the  particular  form  of  the 
derivatives  of  J(z)  and  Jq{z).  As  such,  I  will  now  recall  the  specific  forms  of  these 
derivatives.  Since  cr(Y)  =  E[Y ]  is  Frechet  differentiable,  J(z )  is  also  Frechet  differ¬ 
entiable,  and,  as  seen  in  Section  2.3,  J(z)  has  a  gradient 


V  J(z)  =  az  +  w(z) 


where  w  =  w(z)  £  Z  solves 


Rw  =  E[B*(y)X{y-z)] 


and  A (y)  =  A(y;  z)  £  W*  for  all  y  £  T  solves  the  adjoint  equation 


A*(y)Kv)  +  Q*{Qu(y-,z)  -  q)  =  o  Vye  r. 


(3.4.2) 


Moreover,  employing  the  discretization  in  (3.3.6)  results  in  the  following  discretized 
gradient 

V Jq(z )  =  az  +  wQ(z ) 

where  wq  =  wq(z)  £  Z  solves 

Q 

Riuq  =  Eq[B*X(z )]  =  ukB*k\k(z ), 


k=  1 


Afc  =  A k(z)  £  W*  solves  the  adjoint  equation 


A*kXk  +  Q*(Qtifc(z)  -  q)  =  0  V  k  =  1, . . .  ,Q, 


and  uk  =  uk(z )  £  V  solves  the  state  equation 


A kuk  +  Bkz  +  bk  =  0  V  k  =  1, . . . ,  Q. 


Here,  note  that  since  uk(z)  =  u(yk,z)  where  u(y;  z)  solves  e(u(y),z-,y)  =  0,  one  also 
has  that  A k(z)  =  X (yk,  z)  where  A (y,  z)  solves  the  adjoint  equation,  (2.3.2).  Now,  since 
R  is  the  linear  operator  representing  the  Z  inner  product,  R  is  a  positive,  invertible 
operator,  i.e.  there  exists  a  bounded  inverse  R_1  £  C(Z*,Z),  and  w  =  w(z)  £  Z 
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can  be  written  as  w(z)  =  R_1£’[B*(r/)A(|/;  z)\.  Similarly,  for  the  discretized  problem, 
wq(z )  =  R-1.Eq[B*A(,z)].  Computing  the  difference  between  the  true  gradient  and 
the  discretized  gradient,  thus  gives 

VJ(z)  -  VJQ(z )  =  w(z)  -  wQ{z)  =  R -\E  -  Eq)[B*X(z)].  (3.4.3) 

This  clearly  shows  that  the  error  in  the  gradient  vectors  is  controlled  by  the  error 
associated  with  the  quadrature  operator,  Eq,  or  equivalently  by  the  error  associated 
with  the  interpolation  operator,  jCq. 

Theorem  3.4.1  Suppose  z*  G  Z  satisfies  the  first  order  necessary  conditions  for 
(3.3.1)  and  suppose  Zq  G  Z  satisfies  the  first  order  necessary  conditions  for  (3.3.6). 
Then,  the  error  between  z*  and  zfi  satisfies 

a\\z‘  ~  4 IU  <  sfllR-'BWy.)  -  £gR-‘B*A(4)|U' . 

Proof:  Since  J(z)  is  uniformly  convex,  it  satisfies  the  inequality 

all*'  -  4111  <  (VJ(z')  -  VJ(4),V  -  z-Q)z, 

and  the  Cauchy- Schwarz  inequality  implies 

a||V-4IU<||VJ(4-VJ(4)|U. 

Now  since  z*  E  Z  is  a  first  order  critical  point  of  J(z*)  and  ZqEZ  is  a  first  order 
critical  point  of  Jq(z ),  i.e.  WJ(z*)  =  0  and  VJq(zq)  =  0,  the  above  inequality  can 
be  rewritten  as 

a||V  -  Zq\\z  <  ||  VJq(4)  -  vJ(4) IU. 

Plugging  (3.4.3)  into  the  right  hand  side  of  the  above  inequality  gives 

all**  -  411*  S  llR^‘(B  “  £«) [B*A(4)]|U, 
and  an  application  of  Jensen’s  inequality  yields  the  desired  result.  □ 

From  this  theorem,  the  following  corollary  follows  directly  from  Assumption  3.2.4. 


55 


Corollary  3.4.2  Suppose  the  solution  to  the  adjoint  equation,  X  G  U,  satisfy  As¬ 
sumption  3.2.1  and  Cq  satisfies  Assumption  3.2.4 ■  Then  there  exists  positive  con¬ 
stants  C  =  C(p(zq),  a)  and  v  such  that 

V  -  zq\\z  <  CQ-r 

Proof:  This  is  a  consequence  of  Theorem  3.4.1  and  the  error  bound  in  Assump¬ 
tion  3.2.4.  Assumption  3.2.4  is  applicable  due  to  the  tensor  product  nature  of  y*  =U 
(see  Section  2.5  for  more  details).  □ 


Remark  3.4.3  Recall  the  linear  quadratic  control  problem  described  in  Section  2.1. 
For  this  model  problem,  A  is  defined  by 

(A (y)v,w)v*y=  /  e(y,  x)'Vv(x)  ■  \7w(x)dx  V  v,  w  G  V 
J D 

and  is  uniformly  bounded  above  for  all  y  G  T  because  e  G  L°°(r  x  D).  Furthermore, 
A  (y)  has  an  almost  everywhere  bounded  inverse  because  of  the  uniform  ellipticity 
assumption  (2.1.11).  On  the  other  hand,  b(?/)  =  0  and  B  is  independent  of  y  G  T . 
Therefore,  if  there  exists  C  >  0  and  b  <  oo  such  that 

dne 


dy\ 


:(y,x)<Cb* 


for  almost  all  (y,x)  G  T  x  D  and  for  all  n  G  N,  then  Theorem  3.2.2  ensures  that 
the  solution  to  the  state  equation,  u,  has  an  analytic  extension.  Moreover,  these 
conditions  and  the  analyticity  ofu  guarantee  that  the  solution  to  the  adjoint  equation, 
X,  also  has  an  analytic  extension.  Thus,  Corollary  3-4-2  applies  to  this  test  problem. 


3.4.2  Minimizing  the  Mean  Plus  Semi-Deviation 

The  expected  value  problem  (3.4.1)  is  typically  not  a  sufficient  reformulation  of  (2.2.9). 
In  many  engineering  applications,  a  design  or  control  holding  “on  average”  is  unac¬ 
ceptable.  It  is  often  necessary  to  account  for  tail  probabilities  and  extreme  events. 
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Such  “robust”  optimization  problems  are  formulated  using  risk  measures.  I  will  now 
generalize  the  results  of  Theorem  3.4.1  and  Corollary  3.4.2  to  the  mean  plus  semi¬ 
deviation  coherent  risk  measure.  Recall  that,  in  general,  the  mean  plus  semi-deviation 
of  order  q  is  defined  as 

r  1 1/q 

CT,(r):=B[r]  +  cB[[r-i![yK 

for  c  G  [0, 1]  and  crq(Y)  can  be  written  in  terms  of  the  auxiliary  function 

aq  :  R  x  Lqp(T)  -»■  R 

as  aq(Y)  =  aQ(E[Y],Y)  where 

r  1 V? 

aq(t,  Y)  :—t  +  cE  [Y  —  t]q+  . 

This  characterization  will  allow  for  easy  derivative  computation.  Since  crq(Y )  satisfies 
Assumptions  2.2.11  (i.e.  a q(Y)  is  coherent)  and  the  deterministic  objective  function 
is  separable,  the  reformulated  objective  function  corresponding  to  mean  plus  semi¬ 
deviation  can  be  written  as 

J(z)  =  ^i(||Q«(a)  -  q\\h)  +  f  Iklll- 

This  objective  function,  J(z),  is  Hadamard  differentiable  and  admits  a  gradient  since 
Z  is  a  Hilbert  space.  This  Hadamard  differentiability  follows  from  the  discussion  on 
page  16  of  [100].  Define  the  set 

y(-)  :=  {«/  e  r  :  WQu(y-,z)  -  q\\2n  >  E  ||Qu(^)  -  qfn  }, 

and  let  Xy(z)  denote  the  characteristic  function  of  the  set  y(z),  then  the  gradient  in 
Z  corresponding  to  the  Hadamard  derivative  is  given  by 

YJ(z)  =  az  +  RT1  j£[B*A]  -  cE[Xy{z)}E[B* X]  +  cE[Xy{z)B*\\ 

=  az  +  R_1/s[B*A]  +  cCov^xy(2),  B*A^ 


(3.4.4) 
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where  A  =  A(z)  solves  the  adjoint  equation  (3.4.2).  Here,  Cov(Yi ,  Y2)  denotes  the 
covariance  between  the  random  variables  Y\  and  Y2.  The  covariance  is  defined  as 

Cov(Yj,  Y2)  =  E[(Yi  -  E[Y 1])(Y2  -  E[Y2})}  =  E[Y,Y2 }  -  £[Y7]£[Y7]. 

To  discretize  this  objective  function,  J(z ),  replace  ||Qu(^)  —  with  its  inter- 
polant,  i.e. 

Jq(z)  =  ^0i(£qIIQ«(*)  -  Q\\n)  +  ||MII 

where  jCq  is  an  appropriate  interpolation  operator.  The  linearity  of  Cq  and  the 
gradient  (3.4.4)  implies  the  following  form  of  the  gradient  of  Jq(z) 

VJq  (z)  =  az  +  R  1|e[£qB*A]  +cCov(xyQW,£QB*A)|  (3.4.5) 

where 

yQ(z)  ■=  {y  e  T  :  £Q\\Qu(y;z)  -  q\\2n  >  E  £q||Q«(z)  -  qfn  }. 

As  was  done  in  the  proof  of  Theorem  3.4.1,  I  will  need  to  quantify  the  error  be¬ 
tween  V  J(z)  and  XJq(z)  at  the  same  control  value.  To  do  this,  first  notice  that  the 
covariance  operator  satisfies 

Cov(Yi,  Y2)  -  Cov(Xu  X2)  =  Cov(Yj,  Y2  -  X2)  +  Cov(Yj  -  XUX2) 

for  any  random  variables  1) ,  Y2,  Xx,X2  e  L2(T).  Applying  Holder’s  inequality  to  this 
equality  gives 

|Cov(Yi,  Y2)  -  Cov(X1,X2)|  <1117  -  A[lj]  11^^)11(17  -  X2)  -  E[Y2  -  X2]\\Lj>{r) 

+  \\X2  -  S[X2]||L»(r)||(Y1  -  A7)  -  E\YX  -  X1]||Li(r). 

(3.4.6) 

Now  notice  that  an  application  of  the  triangle  inequality  to  ||Y  —  A[Y']||  for  any  Lq{T) 
norm,  ||  •  ||,  gives 

\\y  -  BMII  <  Ill'll  +  ll-E[y]||  =  Ill'll  +  \E[Y]\  <  2\\Y\\. 
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Therefore,  multiple  applications  of  the  triangle  inequality  to  the  right  hand  side  of 
(3.4.6)  gives 

|Cov(y1,y2)-Cov(x1,x2)|  <  4{||yi||£o0(r)||y2-^2|Ui(r)+||^2||L~(r)||Vi-A'1||Li(r)}. 

Plugging  in  Yx  =  Xy{x),  Y2  =  B *p,  =  XyQ(x),  and  A"2  =  CQB*p  gives  the  following 

error  representation  for  the  gradients 


||VJ(z)  -  VJq(z)\\z  <E  HR-'B^A^)  -  B*A(^)||2: 


sup  |R"iB(j/)*A(y;  z)  -  R-1£QB(j/)*A(y;  z) 

ye  r 


ii 


+  4 


III 

where  y(z)Ayq(z)  =  (3^(^)\3^q(^))U(3^q(^)\3;(^))  denotes  the  symmetric  difference 
of  the  sets  y(z)  and  yq{z).  Notice  that  expressions  I  and  II  can  be  handled  using 
existing  error  bounds  for  the  interpolation  operator  Cq  (see  Theorem  3.4.1),  therefore 
it  remains  to  bound  III.  To  do  this,  one  must  bound  the  size  (i.e.  probability)  of  the 
symmetric  difference  y(z)Ayq(z). 

In  general,  determining  meaningful  bounds  on  the  size  of  y(z)Ayq(z)  is  not 
possible.  To  circumvent  this  issue,  it  is  convenient  to  replace  [-]+  with  a  C°°  approx¬ 
imation.  A  common  choice  of  smooth  approximation  is 


p(x,  7)  =  x  +  -  log(l  +  e  1X ) 

7 

(c.f.  see  [38]  for  more  details).  This  function  has  many  nice  properties;  most  impor¬ 
tantly,  p(x,  7)  tends  to  [x]+  as  7  grows  to  positive  infinity.  Other  significant  properties 
are 

p(x,  7)  >  M+,  \p'{x,  7)|  <1,  and  \p"{x,  7)!  <  ^  VieM. 

Furthermore,  p(x,  7)  is  strictly  convex  and  strictly  increasing  (see  Lemma  1.1  in  [38]). 
The  fact  that  p"(x,  7)  is  bounded  for  all  x  €  M  implies  that,  for  fixed  7  G  (0,  00), 
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p'(x,  7)  is  Lipschitz  continuous.  Now,  define  the  C°°  approximation  of  <rq(Y)  as 

;(Y)  ■=E[Y]+cE  [p(Y  -  E[Y], 7)«]  ^ 

for  fixed  7.  The  above  analysis  can  be  replicated  by  replacing  aq  with  af,  which  gives 
the  following  gradients 

VJ(z)  =az  +  R_1£'[B*p] 

+  cR~1Cov(p'(\\Qu(y-,z)-q\\ll-E  \\Qu(z)  ~  q\\^  ,7),B*A^  (3.4.7) 

and 

VJq(z)  =az  +  R~1E[£QB*p } 

+  cR-1Cov(p'(cQ\\Qu(y]z)-q\\2n-E  £q\\Q,u(z)  -  q\\2H  ,^,£QB*\ 

(3.4.8) 

Now,  computing  the  difference  between  these  gradients  and  employing  the  properties 
of  p(x,  7)  described  above  gives  the  bound 


II VJ(z)  -  VJq(z) |U  <  I  +  4  x  II  +  27  sup  |R-1£QB(j/)*A(y; z)\ 

yer 


x  E 


IQ«(y; z)  ~  q\\h  -  £qIIQ«(-)  -  q\\ h 


(3.4.9) 


This  bound  demonstrates  the  error  in  the  gradients  is  controlled  by  the  interpola¬ 
tion  error  associated  with  jCq.  Hence,  for  this  C°°  approximation  to  the  mean  pins 
semi-deviation  problem,  one  gets  a  similar  error  bound  on  the  gradients  as  with  the 
expected  value  problem. 

Theorem  3.4.4  Suppose  z*  G  Z  is  a  first  order  critical  point  of 

1  -v  / ..  —  /  '  —  1 1 2  5  a  i 


J(z)  =  2°l[\\Qu(y,z)  -  qWh)  +  2  INI  * 


and  Zq  G  Z  is  a  first  order  critical  point  of 


Mz )  =  \v1(£q\\Qu(¥,z)  ~  q\\n)  +  |lMli- 
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Furthermore,  suppose  the  assumptions  of  Corollary  3-4.2  hold.  Then  there  exists 
positive  constants  C  =  C (u(zq) ,  p(zq) ,  7,  a)  and  v  such  that 

II z* -z*q\\z<CQ-v. 

Proof:  J(z)  is  uniformly  convex  since  crJ(Y)  is  convex  and  increasing.  Therefore, 
the  error  in  the  controls  can  be  bounded  by  the  errors  in  the  gradients,  V  J(z*q)  and 
VJq(  Zq)  as  done  in  the  proof  of  Theorem  3.4.1.  Using  the  bound  (3.4.9)  and  the 
Assumption  3.2.4  gives  the  desired  result.  □ 


3.4.3  Minimizing  the  Conditional  Value-At-Risk 

It  is  often  necessary  to  account  for  tail  probabilities  and  extreme  events.  One  method 
of  formulating  these  “robust”  optimization  problems  is  to  employ  the  conditional 
value-at-risk  (CVaR).  CVaR  quantifies  the  risk  on  the  tails  of  the  distribution  of  the 
objective  function  and  is  defined  as 

r  -  <]+ 

and  much  like  the  mean  pins  semi-deviation,  CVaR  can  be  written  using  the  auxil¬ 
iary  function  aq(t,Y)  as  ocvaR(V)  =  minteR  <7i(f,  Y).  It  can  further  be  shown  that 
minimizing  the  CVaR  objective  function  over  Z  is  equivalent  to  minimizing  a\(t,  Y) 
over  the  augmented  space  (t,z)  61x2  [117].  For  this  analysis,  I  will  consider  the 
optimization  problem 

=di(t,^\\Qu(y;z)  -  q\\^  +  |||^||| 

where  u(y,z)  =  u  G  U  solves  (2.2.4).  The  analysis  performed  for  the  mean  plus 
semi-deviation  risk  measure  essential  holds  for  the  CVaR  risk  measure  and  therefore 
it  will  be  critical  to  replace  [•]+  in  (TCyaR  with  p(-,y).  The  C°°  approximation  of  the 
CVaR  objective  function  is 

J{t,z)  =  af{t,^\\Qu(y,z)~q\\^  +||N||  (3.4.10) 


cU'VaRV)  =  mint  +  cE 
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where 

{Y,  t)  =  t  +  cE  p(Y  -t,  7)  . 

The  gradient  of  J(t,z )  is  thus  computed  using  the  partial  derivatives  of  J(t,z )  with 
respect  to  tel  and  z  G  Z: 

VtJ(t,z)  =1  -cE  p' (^\\Qu(y,  z)  -  q\\n  -  t ,7) 

VzJ{t,z)  =az  +  cR-1E  p'{^\\Q u(y\z)  -  q\\2n 

where  A  =  A(z)  solves  the  adjoint  equation  (3.4.2). 

Employing  the  stochastic  collocation  discretization  scheme,  one  can  write  the 
collocation  and  C°°  approximate  CVaR  objective  function  as 

JQ(t,z)  —  aj  (t,  ^CQ\\Qu(y,  z)  -  qf^j  +  |||-|||  (3.4.11) 

which  admits  the  gradient 

VtJQ(t,z)  =1  —  cE  p' [^£Q\\Qu(y,  z)  -  q\\2n  —  t, 7) 

VzJq{t,z)  =az  +  cR~1E  p' (^£Q\\Qu(y;  z)  -  qfn  -  t,  t)£qB*A  . 

Invoking  the  Lipschitz  continuity  of  £>'(•,  7)  and  Jensen’s  inequality,  the  collocation 
error  associated  with  t  component  of  the  gradient  is  bounded  by 

I VtJ{t,z)  -  VtJQ{t,z) |  <cE  \p' (^£Q\\Qu(y,  z)  -  qfn  -t,y) 

-  p'(^IIQ  u(y,  z)  -q\\n~tn)\ 

~Ye  -q\\h~  \\Qu(y’,z)  -q\\h\  ■ 

Therefore,  the  collocation  error  in  the  t  component  of  the  gradient  of  J(t,z )  is  con¬ 
trolled  by  interpolation  error.  Similarly,  the  collocation  error  associated  with  the  z 
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component  of  the  gradient  is  bounded  above  in  the  following  manner 


||V*J(t,z)  -  VzJQ(t,z) \\z  —c 


R1# 


p'{\ HQ«(y;  z)^q\\^i-t,'y)B*X 


-  p'( z)  -q\\h  -*>t)£qB*a 


<cE 


p'(|||Q«(3/;  2)  -  all «  -  *,t)|IIR-_1(b*a  -  £„b*a)|U 


+  _t’7) 

-  p'(^£ellQ“(»;  *)  -  9ll«  -  *,  7)  I  ||R-‘£qB-A|U 


<cE 


+ 


R-i(B*A-£QB*A)b 


c7suPyer  II R  ^qB*(j/)A(j/)||j 


(3.4.12) 


x  E 


\Qu(y,z )  -  g||^  -  £q||Qu(|/;p)  -  g||^| 

J 

(3.4.13) 

Clearly,  the  error  associated  with  the  z  component  of  the  gradient  can  also  be  con¬ 
trolled  by  interpolation  error.  This  brings  about  the  following  error  bound. 


Theorem  3.4.5  Suppose  ( t*,z *)  G  M  x  Z  is  a  first  order  critical  point  of  (3.4.10) 
and  (t*Q,z*Q)  G  R  x  Z  is  a  first  order  critical  point  of  (3.4.11).  Furthermore,  suppose 
the  assumptions  of  Corollary  3-4.2  hold.  Then  there  exists  positive  constants  C  = 
C(u(zq),  \(zq),  7,  a)  and  v  such  that 

\\z*  -  z*Q\\z  <C{Q-V  +  \t*  -t*Q\). 

Proof:  J(t,z )  is  uniformly  convex  with  respect  to  z  G  Z  and  strictly  convex  with 
respect  to  t  G  M  since  Y )  is  convex  and  increasing  in  both  arguments.  Therefore, 
the  error  in  the  controls  can  be  bounded  by  the  errors  in  the  gradients,  V  J(tq,  Zq) 
and  YJQ{t*Q,  Zq)  as  done  in  the  proof  of  Theorem  3.4.1,  i.e. 

*  -  4IU  <||VV(i-,  V)  -  vj{f,z'Q)  |U 
=l|VVa(«a.4)-vp(f,4)|U 

<\\VzMCQ,rQ)-vJ(eQ,z'Q)\\z  +  \\vJ(t'Q,z'Q)-x>J(e,z'Q)\\z. 


a\\z 
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Applying  the  bound  (3.4.13)  and  Assumption  3.2.4  to  the  first  term  on  the  right  hand 
side  and  invoking  the  Lipschitz  continuity  of  p'  to  handle  the  second  term  one  gets 


a\ 


\z-  -  41,  <  C(«(4),  A(4),7)Q-  +  ^E\ ||R-1BU(4)||,|  \r  -eQ\. 
This  gives  the  desired  result. 


□ 


Remark  3.4.6  To  obtain  a  meaningful  error  bound,  one  also  needs  to  quantify  the 
error,  \t*  —t*q\,  in  terms  of  Q.  Currently,  this  is  future  work.  Note  that  the  techniques 
used  to  bound  the  error  \\z*  —  z*q\\z  cannot  be  applied  to  the  error  |£*  —  tg\  since  J(t,  z ) 
is  only  strictly  convex  with  respect  to  t  e  M. 


Chapter  4 


High  Dimensional  Interpolation 


In  this  chapter  I  will  review  and  extend  a  general  class  of  high  dimensional  interpola¬ 
tion  and  quadrature  operators.  These  operators  are  essential  for  efficient  application 
of  the  stochastic  collocation  discretization  technique  for  the  solution  of  the  parametric 
equation  e(u(y),z;y )  =  0  for  y  e  T  when  T  is  a  high  dimensional  parameter  space. 
I  will  first  review  a  few  standard  results  concerning  one  dimensional  Lagrangian  in¬ 
terpolation.  Then,  I  will  extend  the  one  dimensional  interpolation  operators  to  a 
general  class  of  high  dimensional  interpolation  operators  built  on  the  concepts  of 
Smolyak’s  Algorithm  [110]  and  weighted  tensor  approximation  [120,  52,  15].  Finally, 
I  will  present  an  extension  of  the  dimension  adaptive  ideas  of  Gerstner  and  Griebel 
[52]  for  the  approximation  of  high  dimensional  integrals. 

4.1  One  Dimensional  Interpolation 

The  one  dimensional  interpolation  operators  considered  in  this  section  will  be  denoted 
Ck,j  :  C°fc(Tfc)  — >  Pmfc>j_1(Tfc)  for  k  —  1, . . . ,  M.  Here,  Ckj,  3  —  1,  2, ... ,  denotes  a 
sequence  of  one  dimensional  interpolation  operators  exact  for  polynomials  of  degree 
rrikj  —  1  or  less  and  {rrq0J}jh]  is  a  monotonically  increasing  sequence  of  positive 
integers  with  =  1.  Moreover,  A =  {yk,j,q}™=i  Cb  will  denote  the  finite  set  of 
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distinct  interpolation  knots  corresponding  to  the  operator  Ckj.  For  any  /  G  C°pk{Tk)} 
the  one  dimensional  interpolation  operators  considered  here  are  defined  as 


mk,j 

(£kjf)(y)  =  ^2£k,jMf{ykj,q)- 

9=1 

The  functions  £k,j,q  G  Pm'=.3-1(rfe)  are  taken  to  be  the  Lagrange  interpolating  polyno¬ 
mials  built  on  the  one  dimensional  knots,  A4j,  i.e. 


mk,j 

V- k,j,q{Uk )  = 

s= 1 


Vk  Vk,j,s 
Uk,j,q  Vk,j,s 


s¥zq 

Other  one  dimensional  interpolating  polynomials  such  as  piecewise  polynomials  or 
splines  are  also  applicable  to  the  tensor  product  constructions  which  follow,  but  I  will 
restrict  my  attention  to  Lagrange  interpolation. 

The  norm  of  the  Lagrange  interpolation  operator,  Ck,j,  is  known  as  the  Lebesgue 
constant ,  i.e. 


A Pk,j  '■=  HAj'lkp  for  1  <  p  <  oo, 
where  ||  •  \\k,p  denotes  the  operator  norm 


||-£fcj||fc,p  sup  l|£fc,j/||LL  (iyy 

For  the  remainder  of  this  thesis,  I  will  restrict  my  attention  to  the  case  where  p  =  oo. 
The  associated  Lebesgue  constant  satisfies 

Wlk,j 

Ak,j  ■=  A \£k,j,q(y)l 

9=1 

Since  A^j  is  dependent  on  the  choice  of  interpolation  knots,  it  is  valid  to  ask  “what 
is  the  optimal  choice  of  knots,  A and  “how  does  the  Lebesgue  constant  for  these 
optimal  knots,  A  £  behave.”  It  can  be  shown  that  the  Lebesgue  constant,  A  £ 

satisfies  the  lower  bound 

2  2  /  4\ 

A k,j  >  A*k  j  >  -  log  (mkj.  -  1)  +  -  ( 7  +  log  -  ) 

J  71  71  V  71/ 
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where  7  «  0.5772  is  Euler’s  constant  (c.f.  see  the  remark  on  page  701  of  [30]).  A 
consequence  of  this  result  is  that  j  grows  unboundedly  as  j  increases  toward  infinity. 

A  simple  consequence  of  these  definitions  is  any  interpolation  operator,  Ckpj  gives 
the  following  interpolation  error  for  all  /  G  Cpfe(Tfc) 

11/ -  £k,jf\\c°k (rk)  <  ||  f  -  P  +  £k,j(p  ~  f)\\c°k(rk) 

<  11/  ~P\\c°k(rk)  +  \\£k,j(p~  f)\\c°k(Tk) 

for  any  p  E  (rfc).  Therefore, 


11/  -  £kjf\\c°k (rk)  <  (1  +  A k,j) 


per, 


inf  fr  Jl  f 

%k,j “i'1  k> 


Pllc?t(r»). 


(4.1.1) 


Thus,  to  fully  characterize  the  interpolation  error,  one  must  be  able  to  bound  the 
error  between  /  e  C{]()k  (T//  and  its  best  approximation  in  (T On  the  other 

hand,  the  error  (4.1.1)  is  highly  dependent  on  the  size  of  the  Lebesgue  constant,  Afcj, 
which  depends  on  the  choice  of  interpolation  knots,  In  addition  to  (4.1.1),  the 

error  corresponding  to  the  interpolation  operator,  Ckj,  satisfies 


||Ifc  ^fcj||fc,oo  1 T  A kj  (4.1.2) 

where  I*  denotes  the  identity  operator  on  C°fc(Tfc)  (c.f.  see  Equation  8  and  the  proof 
that  follows  in  [97]).  This  also  gives  the  upper  and  lower  bounds  on  the  difference 
between  two  consecutive  interpolation  rules,  Akj  :=  Ckj  —  1, 


|Afcy  Afcj_i|  P  ||Afcj||^i00  P  A k  j  T 

Another  property  of  these  knots,  A4j,  that  will  be  essential  for  the  high  dimen¬ 
sional  generalizations  of  these  one  dimensional  interpolation  operators  is  whether  or 
not  these  knots  are  nested. 


Definition  4.1.1  The  one  dimensional  interpolation  abscissa  are  nested  if 


A4y  c  A4j+1  Vj. 
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They  are  weakly  nested  if 

Mk ,j  n  Afcj+i  7^  0>  but  Mk,j  ft-  A/fcj+i  Vj. 

See  table  4.1  for  common  choices  of  one  dimensional  abscissa.  Nested  knots  and  to 


Abscissa 

Domain 

Weight 

mi 

Nested? 

Clenshaw-Curtis 

[-1,1] 

p(y)  =  i 

2*_1  T  1 

Yes 

Gauss-Patterson 

[-1,1] 

p(y)  =  i 

2i+1  -  1 

Yes 

Gauss- Legendre 

[-1,1] 

p{y)  =  i 

2i+1  -  1 

Weakly 

Gauss-Hcrmite 

(  oo,  +oo) 

p(y)  =  e~y~ 

2i+1  -  1 

Weakly 

Genz-Keister 

( — CX),  +oo) 

p(y)  =  e~y2 

{1,3,9,19,35,. 

..}  Yes 

Gauss- Legendre 

[0,  Too) 

p(y)  =  e~y 

2i+1  -  1 

No 

Table  4.1:  Common  one  dimensional  abscissa  with  exponential  growth  rules,  m,. 

some  extent,  weakly  nested  knots  will  give  severe  reductions  in  the  number  of  high 
dimensional  interpolation  knots  required  to  achieve  a  desired  accuracy. 

4.1.1  Interpolation  and  Analytic  Functions 

The  Stone- Weierstrass  Theorem  implies  that  the  space  of  polynomials  is  dense  in 
cjum  m-  see  Theorem  4.45  in  [49]);  that  is,  any  function,  /  e  C°fe(Tfc)  can  be 
approximated  arbitrarily  closely  by  a  polynomial.  This  fact  does  not  ensure  that  a 
given  sequence  of  interpolation  operators  converges  for  all  members  of  C°pk{Yk)-  In 
fact,  for  each  Ckij  there  exists  a  function  /  e  C°pk{Yk)  for  which  ( Ck,jf)(y )  diverges 
almost  everywhere  (e.g.  Runge’s  function  for  equi-distant  interpolation  knots).  This 
result  is  presented  as  Theorem  5.3  in  [45].  This  is  no  longer  the  case  when  the  domain 
of  the  operator  Ck,j  is  switched  from  /  e  C°pk  (Tfc)  to  the  analytic  functions  defined 
on  an  elliptic  disc  in  the  complex  plane.  This  function  space  will  be  denoted  by 

A(Tklrk)  '■=  {/  :  C  — >  R  :  f  is  analytic  on  Drk(Yk)}  . 
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The  set,  Drk(Tk),  is  an  open  elliptic  disc  containing  T*,  with  semi-axis  sum  r k  >  1  (e.g. 
see  (3.2.1)  for  an  example  of  Drk (Tfc)  with  T k  =  [—1, 1]).  The  analyticity  assumption, 
Assumption  3.2.1,  ensures  that  for  fixed  directions,  Tk,  the  solution  of  the  PDE  with 
uncertain  coefficients  (2.2.4)  is  a  member  of  A/T*,,  rk)-  One  result  of  particular  interest 
to  this  work  is  Theorem  8.1  in  Chapter  7.8  of  [45].  This  theorem  concerns  the  best 
approximation  of  an  analytic  function,  /  G  A(rk,rk)  with  algebraic  polynomials.  A 
consequence  of  the  proof  of  this  theorem  is  the  following  lemma. 


Lemma  4.1.2  Suppose  f  G  C°pk  (Tfc)  has  an  analytic  extension  on  the  elliptic  disc, 
D,k  (Tj.) .  Then,  the  best  approximation  of  f  by  the  algebraic  polynomials  of  degree  d, 
Pd(rfc)  satisfies 


min 

pePd(rfc) 


\\f  ~  P\\c$k(Tk)  < 


sup  I f(y)\ ■ 

y&Drk 


Combining  this  result  with  (4.1.1),  it  is  easy  to  see  that  for  any  /  G  C®k(Yk)  which  has 
an  analytic  extension  on  the  elliptic  disc,  Drk(Tk),  the  interpolation  error  associated 
with  the  operator,  Ckj,  satisfies 


11/  -  £k,jf\\c°k(rk)  <  (1  +  A k,j) 


Tk  - 


1) 

l  k 


sup  | f{y)\. 

y&Drk 


(4.1.3) 


The  convergence  rate  described  in  (4.1.3)  depends  on  the  sum  of  the  semi-axis  lengths, 
Tk  >  1,  of  the  ellipse,  Drk(Tk)  which  is  determined  by  the  analyticity  of  the  function 


4.1.2  Clenshaw-Curtis  Knots 


One  popular  choice  of  interpolation  knots  are  the  so-called  Clenshaw-Curtis  knots. 
These  knots  are  the  extrema  of  Chebyshev  polynomials,  i.e. 

"ir(q  -  1)' 


Vk,j,q  COS 


Wlk,j 


for  q  =  1, . . .  ,mk,i. 


Furthermore,  these  knots  can  be  made  nested  by  choosing  mk,i  to  satisfy 


rrik.i  =  1  and  mk,j  =  2-7  1  +  1  for  j  >  1, 
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Figure  4.1:  Clenshaw-Curtis  quadrature  nodes  for  levels  j  —  1, ...  ,6. 

see  Figure  4.1.  These  knots  are  quite  popular  because  they  admit  a  relatively  small 
upper  bound  for  the  Lebesgue  constant.  This  upper  bound  is 

2  2 

A k,j  <  ~  log (mkJ  -  1)  +  1  -  atj  <  -  log (mkJ  -  1)  +  1 

7 r  7 r 

for  nikj  >  2  and  some  oq  e  (0,  l/m*,j)  (see  e.g.  [30]).  In  addition,  plugging  the 
particular  form  of  mkj  into  the  bound  for  the  Lebesgue  constant  gives 

A k,j  <  -(j  ~  1)  log(2)  +  1  =  -  log(2)j  +  (1  -  -  log(2)). 

7 T  7T  7T 

Combining  this  result  with  (4.1.3)  the  interpolation  error  for  functions  /  €  C{)pk  (T^) 
with  analytic  extension  on  Drk  (r^.)  satisfies 

11/  -  £k,jf\\c°k(rk)  <  C(rk,  f)jrf3  (4.1.4) 

where  C  =  C(rk,f)  is  a  positive  constant  depending  on  rk  and  /,  but  not  on  the 
polynomial  degree,  j.  This  bound  on  the  one  dimensional  interpolation  error  also 
gives  a  bound  on  the  difference  of  two  consecutive  operators,  Akj  =  £kj  —  £k,j- i, 


Afcj/||c-ofc(rfe)  <  \\£k,jf  ~  f\\c°k(rk)  +  \\f~  £k,j-if\\c°k(rk)  <  D{rk)  f)jrk23  (4.1.5) 
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where  D  =  D(rk,  /)  is  a  positive  constant  depending  on  r*,  and  /,  but  not  on  the  poly¬ 
nomial  degree,  j.  These  difference  operators  will  be  paramount  in  the  construction 
of  general  high  dimensional  interpolation  operators. 

4.2  Tensor  Product  Polynomial  Approximation 

The  high  dimensional  interpolation  operators  considered  in  this  thesis  are  natural 
extensions  of  the  one  dimensional  operators  discussed  in  Section  4.1.  These  high  di¬ 
mensional  operators  are  built  on  the  differences  between  two  consecutive  interpolation 
operators,  i.e. 

A^j  T/,-j  where  o  d 

for  j  >  1  and  k  =  1, ,  Ad.  For  all  functions,  /  G  with  an  analytic  extension 

on  Drk(Tk),  the  difference,  A kjf,  converges  to  zero.  Furthermore,  for  k  —  1, . . . ,  M, 
these  difference  operators  satisfy  the  telescoping  sum  properties: 

n 

A fcy  £fc,i  and  ^  ^ 

3  = 1 

The  tensor  product  interpolation  operators  considered  here  are  built  on  high  dimen¬ 
sional  tensor  products  of  these  one  dimensional  difference  operators.  These  tensor 
product  difference  operators  are  defined  as 

Aj  =  Aijj  0  •  •  •  0  A M,jM 

for  multi-indices  j  =  (ji ,  •  •  - ,  Jm)  G  NAI .  The  Ad  dimensional  generalization  of  the 
one  dimensional  operator  Ck,j  is  then  given  by 

Ci-=Y1  Ai  (4.2.1) 

iex 

for  some  index  set,  X  C  NM.  The  one  goal  in  selecting  an  appropriate  index  set, 
X,  is  to  maintain  the  telescoping  sum  property  from  the  one  dimensional  case.  This 
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property  will  guarantee  certain  polynomial  exactness  and  interpolation  results  con¬ 
cerning  Lx  by  ensuring  that  similar  properties  of  the  one  dimensional  operators  hold. 
I  thoroughly  discuss  these  properties  in  Subsection  4.2.3.  These  results  will  require 
some  notion  of  admissibility  for  index  sets,  X  C  NAI .  For  clarity,  I  will  adopt  the 
following  notation  and  partial  ordering  on  NM ,  for  i,  j  e  NM , 

j  <  i  jk  <ik  V/c  =  1,  — ,M. 

Definition  4.2.1  The  set  1  C  NM  is  admissible  if  i  G  X  and  j  <  i,  then  j  G  X. 

Remark  4.2.2  The  classic  Smolyak  and  full  tensor  product  operators  are  special 
cases  of  (4.2.1).  The  Smolyak  rule  of  level  £  >  0  corresponds  to  the  index  set 

{M 

i  e  N"  :  -  1)  <  £ 

k=  1 

while  the  full  tensor  product  algorithm  of  level  i  corresponds  to  the  index  set 

Zpp{£,  M)  :=  |i  G  Nm  :  ^  max^(ifc  —  1)  <  l 

See  figure  f.2  for  a  depiction  ofZ^ypf  7,2)  and  Xjp(7,  2).  On  the  other  hand,  given 
any  mapping  g  :  NM  — >  N,  strictly  increasing  in  each  argument,  the  anisotropic  tensor 
product  operator  of  order  £  is  defined  as 

l(£,M,g):=  {ieNM  :  g(i)  <  1}  . 

Notice  that  X  subsumes  both  the  classic  Smolyak  index  set,  Tc<yg{£,  M),  and  the  full 
tensor  product  index  set,  Tpp(£,  M) ;  that  is,  the  classic  Smolyak  index  set  and  full 
tensor  product  index  are  defined  with 

M 

g{ i)  =  ^(4  -  1)  and  g(\ )  =  max  (ifc  -  1), 

fc=i  _ 


respectively. 
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Figure  4.2:  Tensor  product  (blue  and  grey)  and  Smolyak  sparse  grid  (blue)  index  sets 
for  level  i  =  7  and  dimension  M  —  2. 


The  polynomial  properties  of  the  operators,  are  often  disguised  by  the  complex¬ 
ity  in  the  definition  (4.2.1).  Hence,  it  is  often  useful  to  rewrite  (4.2.1)  as  a  linear  com¬ 
bination  of  the  tensor  product  of  one  dimensional  operators,  :=  ® 

This  reformulation  is  called  the  combination  technique  [52,  50].  Let 


Xx  (i)  = 


1  if  i  e  J 

0  otherwise 


denote  the  characteristic  function  of  the  index  set  X.  By  expanding  the  M  dimensional 
difference  operators  Aj  in  terms  of  Ck,j  and  combining  like  terms,  one  can  rewrite 
(4.2.1)  as 

£x  =  W  27  (-l)N»(i  +  z))£,.  (4.2.2) 

iex  \ze{o,i}M  / 

This  form  of  Cx  is  convenient  because  it  demonstrates  the  dependence  of  Cj  on  the  one 
dimensional  interpolation  operators.  Furthermore,  (4.2.2)  demonstrates  the  savings 
achieved  by  (4.2.1)  for  a  given  choice  of  X.  Notice  that  if  i  e  X  such  that  i  +  z  e  X 
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for  all  z  G  {0, 1}M,  then 

M  f  M  \ 

c(i):=  ^  (-l)|z|Xl(i  +  z)  =  jb(-l)*  [  =0. 

ze{o,i}M  k=o  \  k  J 

Therefore,  for  such  i  G  X,  one  need  not  compute  L{. 

It  is  worthwhile  noting  that  much  of  the  work  associated  with  the  operator  Lx 

is  directly  related  to  the  index  set,  X.  This  index  set  should  be  chosen  so  that  Lx 

is  sufficiently  accurate,  but  is  not  overly  expensive  to  compute.  To  do  this,  many 

researchers  have  considered  using  a  priori  information  to  determine  an  “optimal” 

index  set  X.  Optimality,  in  this  sense,  refers  to  minimizing  the  error  ll^ill  subject 

to  a  constraint  that  the  cost  associated  with  X  is  less  than  some  positive  constant  N. 

This  constrained  minimization  problem  is  equivalent  to  a  classic  knapsack  problem 

[13].  X  can  also  be  chosen  in  an  adaptive  fashion.  In  Subsection  4.2.2,  I  will  discuss 

the  adaptive  selection  of  index  sets,  X,  in  the  context  of  sparse  grid  quadrature. 


4.2.1  Tensor  Product  Quadrature 


The  tensor  product  operator,  Lx,  can  be  extended  to  a  high  dimensional  quadrature. 
Let  E  =  Ei  <S>  ■■■  <S>  Em  denote  the  M  dimensional  integral 

M 

p(y)f(y)dy  =  Yl  pk{yk)fk{yk)dyk  v  f  ec°pi(  <g>  ■  ■  ■  <g>  c°pm(tm). 

k=l  ■'Vk 

Associated  with  the  tensor  product  operator,  Lx,  is  the  M  dimensional  cubature 
operator  defined  by  the  composition  Ex  E  o  Lx-  The  operator,  Ex,  is  the  sparse 
grid  cubature  formula  associated  with  the  index  set,  X.  The  combination  technique 
formulation,  (4.2.2),  of  Ex  is 

Ex  =  EoLx  =  Y,[  E  (-l)|z|Xx(i  +  z)W 

iex  \ze{o,i}M  / 

Here,  E\  :=  E  o  L\  is  merely  the  tensor  product  cubature  rule  associated  with  the 
index,  i.  The  cubature  formula,  Ei}  requires  function  evaluations  at  the  nodes 

K  ■=  A/i.t!  x  •  •  •  x 
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Figure  4.3:  Admissible  index  sets  (left)  and  their  corresponding  Clenshaw-Curtis 
quadrature  nodes  (right).  The  first  is  a  tensor  product  rule,  the  second  row  is  an 
isotropic  Smolyak  rule,  and  the  third  row  is  an  arbitrary  anisotropic  rule.  The  blue 
and  grey  squares  indicate  members  of  the  index  sets.  The  blue  squares  correspond  to 
indices  for  which  c(i)  =  Z)ze{0,i}M(-l)|z|Xz(i  +  z)  ^  0. 


Recall  that  the  interpolation  operator,  £i,  is  the  tensor  product  of  one  dimensional 
Lagrange  interpolation  operators,  Cik,  built  on  the  nodes,  J\fk,ik-  The  weights  as¬ 
sociated  with  E\  are  the  Kronecker  products  of  the  corresponding  one  dimensional 
weights.  In  the  case  that  Ck,j  are  one  dimensional  Lagrange  interpolants  built  on  the 
extrema  of  orthogonal  polynomials,  the  cubature  rules  £)  are  merely  tensor  product 


75 


Gaussian  cubature  rules.  On  the  other  hand,  if  Ckj  are  one  dimensional  piecewise 
linear  polynomials  interpolants  built  on  equidistant  points,  then  E\  denotes  a  ten¬ 
sor  product  trapezoidal  rule.  Figure  4.3  depicts  examples  of  the  quadrature  points 
associated  with  Ej  for  different  choices  of  index  sets,  X. 

4.2.2  Dimension  Adaptive  Index  Set  Selection 

Gerstner  and  Griebel  [53],  and  Hegland  [58]  have  presented  an  algorithm  for  the 
adaptive  selection  of  admissible  index  sets,  X,  when  approximating  the  integral  of 
a  scalar  valued  function.  This  work  can  be  extended  to  the  general  tensor  product 
integration  problem 

J  p(y)f(y)Ay  for  /  e  C“(r;  V) 

where  V  is  some  Banach  space.  In  the  context  of  optimization  problems  governed  by 
PDEs  with  random  inputs,  such  integration  problems  arise  in  gradient  and  Hessian- 
times-a-vector  computations.  The  extended  adaptive  tensor  product  algorithm  re¬ 
quires  a  reduction  function,  7  :  V  — >  [0,  00).  One  possible  choice  of  7  is  a  norm 
defined  on  V.  See  Algorithm  (4.2.3)  for  details.  This  algorithm  generates  admissible 
index  sets  and  the  corresponding  general  tensor  product  cubature  formula.  Conver¬ 
gence  of  this  algorithm  is  dependent  on  the  quality  of  the  error  estimators,  7(Aju). 


Algorithm  4.2.3  -  Dimension  Adaptive  Sparse  Grids: 

Set  i  =  (1, . . . ,  1),  O  =  0,  A  =  {i}7  r  =  Apy  77  =  77 ;  =  7 (r) 

while  77  >  TOL  do 

Select  i  G  A  corresponding  to  the  largest  r/; 

Set  A  A  \  {i}  and  O^O  U  {i} 

Update  the  error  indicator  77  =  77  —  rji 
for  k  —  1, . . . ,  d  do 
Set  j  =  i  +  ek 
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ifO  U{j}  is  admissible  then 
Set  A  <—  A  U  {j} 

Set  r  =  AjV 
Set  ?7j  =  7 (f) 

Update  the  integral  approximation  r  —  r  +  r 
Update  the  error  indicator  rj  =  //  +  r/j 

end  i/ 
end  /or 
end  while 


The  convergence  of  Algorithm  4.2.3  is  dependent  on  the  regularity  of  the  inte¬ 
grand,  /  G  C°(T;  V)  and  is  heuristic  in  nature.  In  practice,  Algorithm  4.2.3  seems  to 
work  quite  well  and  typically  results  in  a  large  reduction  of  the  number  of  function 
evaluations  required  to  compute  the  integral.  I  will  use  Algorithm  4.2.3  as  a  means 
to  adaptively  approximate  the  optimization  problems  of  interest.  I  will  discuss  the 
application  of  Algorithm  4.2.3  in  Chapter  5. 

4.2.3  Properties  of  the  Tensor  Product  Operator,  Cj 

I  will  now  discuss  some  consequences  of  the  tensor  product  operators  defined  in  (4.2.1). 
Namely,  I  will  prove  a  result  concerning  the  interpolation  properties  of  Cx-  In  general, 
Cx  need  not  be  interpolatory.  In  this  subsection,  I  will  prove  a  result  characterizing 
in  which  cases  Cx  is,  in  fact,  interpolatory.  Furthermore,  I  will  discuss  the  polynomial 
exactness  properties  of  Cj. 

Associated  with  each  operator,  Cx  is  a  finite  set  of  interpolation  knots  often  called 
a  “sparse  grid.”  The  combination  technique,  (4.2.2),  gives  an  efficient  representation 
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of  this  sparse  grid  through  the  coefficients 

c(i)  :=  (-1)|z|Xx(i  +  z)  7^  0. 

ze{o,i}M 

As  discussed  above,  the  only  indices  of  consequence  to  the  operator,  £j,  are  those  for 
which  c(i)  7^  0,  i.e.  I  :=  {i  G  I  :  c(i)  ^  0}.  Each  tensor  product  operator  C\  for  i  e  1 
requires  function  evaluations  at  tensor  products  of  knots  from  the  one  dimensional 
interpolation  abscissa,  Mkj-  The  set  of  tensor  product  knots  (sparse  grid)  associated 
with  jCj  is  thus  defined  as 

Mi  :=  ^J(A/i,n  X  •  •  •  X  Mm,im ) •  (4.2.3) 

iex 

Notice  that  the  size  the  sparse  grid  is  bounded  above  by 

M 

\Mx\  <  ^2  •  ■  ■  ■  •  \Mm,im  \  =  TT  mk,ik 

iex  iex  fc=1 

and  further  reduction  of  the  size  of  Mi  is  achieved  if  the  one  dimensional  knots,  Mkj, 
are  nested  or  weakly  nested. 

4. 2. 3.1  Polynomial  Exactness 

To  fully  describe  the  polynomial  space  for  which  C%  is  exact,  notice  that  for 
k  =  1, . . .  ,M,  the  sequence  {'mk,j}JLi  is  associated  with  functions  m*,  :  N  — >  N  by 
the  relation,  rrik(i)  =  m,k,i ■  This  mapping  is  monotone  and  injective.  Therefore,  rrik 
has  a  left  inverse.  In  fact,  one  possible  left  inverse  of  rrik  is 

ni^1(p)  =  ruin  {?'  e  N  :  rrik(i)  >  p}  . 

This  choice  of  mjT1  is  an  increasing  function  and  satisfies  the  following  two  properties 

mfc1(mfc(*))  —  *  and  mki'rrik1  (p))  -  P 
see  [12,  9]  for  more  details.  To  simplify  notation,  let 

m(i)  :=  (mi(ii), . . .  ,mM(*M ))  and  m_1(p)  :=  . .  .,m^(pM))- 


78 


Now,  define  the  set  of  polynomial  degrees 

A(X,  m)  :=  {p  G  NM  :  m^p  +  1)  G  X}  . 

The  following  proposition  shows  that  the  tensor  product  operator,  Cx,  is  exact  for  all 
polynomials  with  degree  in  A  (X,  m). 

Proposition  4.2.4  Suppose  X  C  NM  is  admissible  and  the  corresponding  operator, 
Cx,  defined  by  (4.2.1)  is  built  on  one  dimensional  Lagrange  interpolating  polynomials. 
Then  for  any  f  G  C°(T), 


M 


Cif  G  P(T,  J,m)  :=  span  l  \\y{k  :  p  G  A(X,m),  y  G  T  >  . 


(4.2.4) 


Kk=  1 


Furthermore,  Cj  p  =  p  for  any  p  G  Pj. 

Proof:  Define  Pj(T)  :=  span  '■  P  <  j;  V  £  rj  and  notice  that  for  any 

/  G  C°(r),  one  has  A;/  G  Pm(i)-i(r).  Therefore, 

Cjf  G  span  { Up»c)->(r) 


iex 


M 


=  sparry  [^J  span  <  y^k  :  p  <  m(i)  —  1 

iEX 

M 


span<  [^J  span  <  ypkk  :  m  *(p  +  1)  <  i 
l  iex  lfc=i 
(  m  > 

span  <  iffi  :  m_1(p  +  l)  Gl 


,fc=i 


=  Px(r). 


Note  that  the  third  equality  follows  since  T  is  admissible.  This  proves  (4.2.4). 

Now,  I  will  prove  that  Cx  is  exact  for  any  /  G  P(T,X,  m).  By  linearity  of  Cx, 
one  only  needs  to  prove  that  Cx  is  exact  for  the  monomial,  f(y)  =  with 

p  G  A(X,  m).  Fix  p  G  A(X,  m),  then  for  any  i  G  X, 

M  M 

(Ai f)(y)  =  =  Y[(Ck,ik  - 

/c=l  /c=l 
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Since  Ckj  is  exact  for  polynomials  of  degree  mk{j)  —  1  or  less,  (Ck,ik  —  £k,ik-i)ykk  —  0 
whenever  p k  <  mk(ik  —  1)  —  1.  Now,  define  the  index 

i  :=  (*i,  with  ik  =  m^(pk  +  1),  k  =  1, . . . ,  M. 

The  index  i  satisfies  i  G  X  since  p  G  A(X,  m).  Moreover,  for  any  i  >  i, 

(£k,ik  -  £k,ik-i)uik  =  0  by  construction.  Therefore, 

M  M 

X\yik  =  ~~ c^-i)ypkk 

k= 1  iEX  k= 1 

M 

=  y!  —  £k,ik-i)ypk‘ 

i<I  k= 1 
M  ik 

—  n  V  y  {^k,ik  —  Ck,ik-l)ylk 

k= 1  ifc=l 
M 

=  ncKiky{k- 

k=  1 

The  third  equality  follows  because  the  index  set  defined  by 

i  G  NM  such  that  i  <  i 

corresponds  to  a  (possibly  anisotropic)  tensor  product  rule.  Finally,  since 

mk(ik)  =  mk{mk1(pk  +  1))  >  pk  +  1, 

the  one  dimensional  interpolant  £kJk  is  exact  for  the  monomial,  ypkk .  This  holds  for 
all  k  —  1, . . . ,  M  and  therefore,  £j  is  exact  on  P(T,X,  m).  □ 

Barthelmann  et  al.  [15]  and  Back  et  al.  [14]  prove  similar  results  for  the  case  when  X 
represents  classic  isotropic  and  anisotropic  Smolyak  sparse  grids. 


4. 2. 3. 2  Interpolation 


The  combination  technique,  (4.2.2),  shows  that  the  tensor  product  operator,  £j, 
the  specific  form 

Qt 


(£if)(y)  =  ^Lq(.y)f(yq) 


<7=1 


has 
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where  Qx  :  =  |Ax|,  Ax  =  {|/g}^fi  and  Xg  G  P(r,  J,  m)  for  q  —  1,... ,  Qj.  The  goal 
of  this  section  is  to  prove  under  certain  circumstances  that  Lq  is  interpolatory,  i.e. 
Lq(ys)  =  Sqs  for  q,s  =  1,...,QX.  In  general,  interpolation  is  not  guaranteed  for 
the  operator,  Cx.  This  fact  can  be  seen  by  grouping  nonzero  coefficients,  c(i),  in 
the  combination  technique  form  of  Cx.  It  is  clear  to  see  that  the  operator  Cx  is 
interpolatory  for  tensor  product  rules  (i.e.  X  =  X^pp(£,  M))  and  Barthelmann  et  al. 
[15]  prove  this  interpolation  property  for  isotropic  Smolyak  rules  (i.e.  X  =  Xgjyj(£,  M )) 
as  long  as  the  one  dimensional  interpolation  knots  are  nested.  Similar  results  holds 
for  anisotropic  Smolyak  [85] .  These  results  can  be  generalized  to  the  case  of  arbitrary 
admissible  index  sets,  X,  as  long  as  the  one  dimensional  nodes  are  nested. 

Proposition  4.2.5  If  X  C  NM  is  an  admissible  index  set  and 

AfkJ  C  A4J+i  Vj  >  0 

for  k  =  1,  •  •  • ,  M ,  then  Cx  is  interpolatory;  that  is,  for  any  f  G  Ci]p{Y), 

{£if){y)  =  f{y)  VyeNx. 

Proof:  To  prove  this  proposition,  I  will  assume  that  J  C  NM  is  an  admissible 

index  set  such  that  Cj  is  interpolatory.  I  will  then  add  an  index,  j  G  NM ,  to  J  such 
that  J  U  {j}  remains  admissible  and  prove  that  the  resulting  operator,  is  also 

interpolatory. 

Suppose  J  C  NM  is  an  admissible  index  set  such  that  C  j  is  interpolatory.  Further¬ 
more,  suppose  that  Cj  is  constructed  using  one  dimensional  interpolations  operators 
built  on  nested  interpolation  knots 

A 4j  X  A4j+1  Vj  >  0  for  k  =  1, . . . ,  M. 

Now,  suppose  the  index  j  G  NM  \  J  is  such  that  J  U  {j}  is  admissible.  I  will  consider 
two  cases:  y  G  J\fj  and  y  G  A/j  \  A fj,  where  A/j  :=  A/ijj  x  •  •  •  x  A fu,jM  and  A fj  is 
defined  above,  without  loss  of  generality,  assume 

/  =  fi  •  •  •  ■  •  fM  e  c^(ro  ®  ®  c°Pm(Tm)  c  cj(r). 
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For  any  y  G  A fj,  it  is  easy  to  see  from  (4.2.1)  that 

M 

( CJu{i}f)(y )  =  ( £jf)(y )  +  (Aj  f)(y)  =  f(y )  +  Y[(AkdJk)(yk) 

k=  1 

since  Cj  is  interpolatory.  Since  j  qL  J  and  J  U  {j }  is  admissible,  there  exists  a 
k  G  {1, . . . ,  M}  such  that  jk  >  ik  for  all  i  G  J .  By  definition  of  A fj,  ijk  G  Afk.i  for 
at  least  one  i  <  jk-  Therefore,  the  nestedness  of  the  one  dimensional  nodes  implies 
yk  G  Afk,jk-i  C  Afk)]k .  This  shows  that  A jkfk(yk)  =  0  and  hence 

M 

(£ju{j}f)(y)  =  f(y )  +  Y[(^k,jjk)(yk)  =  f(y), 

k= 1 

proving  that  £ju{i}  is  interpolatory  on  the  sparse  grid,  A fj. 

On  the  other  hand,  for  any  y  G  A/j  \  A/j-.  Define  the  set 

Jj-={ iGNM  :  i<j}. 

By  admissibility,  Jj  C  U  {j}  and  \  {j}  C  J .  With  this  definition,  one  can  write 

(CjumfM  =  {£jjm  + 

Notice  that  C j.  defines  the  anisotropic  tensor  product  operator,  £j,  which  is  interpo¬ 
latory.  Furthermore,  notice  that 

J\Jj  =  {ieJ  ■  3  k  G  {1, ... ,  M}  such  that  ik  >  jk}  ■ 

Therefore,  given  an  index  i  G  there  exists  k  G  {1, . . . ,  M}  such  that  ik  >  jk-  By 

nestedness  of  the  one  dimensional  nodes,  it  follows  that  yk  G  Afkjk  C  Afk,ik-i  C  Afk,ik- 
This  implies  that  (A; f)(y)  =  0  for  all  i  G  J  \  J-y  Hence, 

(£ju{i}f)(y)  =  {ZjJKv)  +  {£j\jsf)(y)  =  {£jf)(y)  =  f(y), 


and  thus  completing  the  proof  of  Proposition  4.2.5. 


□ 
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4. 2. 3. 3  Approximation  Error 

The  previous  results  described  the  polynomial  exactness  and  interpolation  properties 
of  the  operator,  Cj.  for  admissible  index  sets,  X.  My  goal,  now,  is  to  give  an  explicit 
representation  of  the  interpolation  error  associated  with  Lx-  First,  I  will  present  the 
necessary  notion  for  the  results  to  follow.  Secondly,  I  will  prove  a  lemma  which  gives  a 
specific  form  for  the  interpolation  error.  Finally,  I  will  extend  this  specific  error  form 
to  the  case  of  interpolating  functions  with  analytic  extensions  one  tensor  products  of 
one  dimensional  Clenshaw-Curtis  knots. 

6 
5 
4 

_ CM 

3 
2 
1 

1  2  3.4  5  6 

'l 

Figure  4.4:  The  left  image  contains  the  level  4  isotropic  Smolyak  index  set,  X,  (blue) 
and  corresponding  forward  margin,  -M(X)  (red).  The  right  image  depicts  Xm  =  X 
with  M  —  2  (blue  and  grey)  and  the  recursively  defined  X\  (grey). 

The  results  in  this  section  are  highly  dependent  on  the  notion  of  the  forward 
margin  of  the  admissible  index  set,  X.  1  will  denote  this  forward  margin  by 

M(X)  :={i  +  ek£X  :  k  =  l,...,M} 

where  ek  denotes  the  index  with  1  in  the  kth  position  and  zeros  everywhere  else.  See 
Figure  4.4  for  an  example  of  the  forward  margin.  Additionally,  the  results  will  be 


83 


dependent  on  the  following  recursively  defined  sequence  of  sets 

Xm  '■=  X  and  Xd  :=  {i  G  :  3  id+ 1  G  N  such  that  (i, i^+i)  G  Id+i\ 

for  d  =  M  —  1 ...  ,1  (e.g.  see  Figure  4.4  for  a  two  dimensional  example  of  these 
recursively  defined  index  sets).  Since  X  is  an  admissible  subset  of  NM,  Id  is  an 
admissible  subset  of  Furthermore,  I  will  use  the  functions  nd  :  Xd  — >  N  dehned  by 


7Td(i)  :=  max{id+i  G  N  :  (i,id+i)  G  Xd+1}  . 


From  this  dehnition,  it  is  clear  that  for  all  i  G  Xd,  the  index,  j  =  (i,  id)  G  Xd+\  for 
all  id  <  7Trf(i).  The  following  result  is  a  generalization  of  arguments  in  the  proof  of 
Lemma  3.2  in  [85].  The  result  in  [85],  is  specifically  for  anisotropic  Smolyak  sparse 
grid  operators  built  on  the  index  sets 

{M 

i  G  NM  :  ~  <  ?■<* 

k= 1 


where  a  :=  min \<k<M  The  corresponding  sets,  Xd,  are  denoted  by  Xd  =  Xa(£,d), 
the  margins  are  denoted  M.(Xd)  =  Xa(l,d),  and  the  mapping  nd  has  the  particular 
form, 


i  = 


1  + 


a 


-  - !) 


«fc 


Qtd+l  ad+ 1 

where  ['J  denotes  the  floor  of  a  real  number.  The  following  notation  will  be  used 
in  the  statement  and  proof  of  Lemma  4.2.6.  Let  Id  denote  the  identity  operator  on 
L2pd(Td)  and  1^  denote  the  identity  operator  acting  on  L~d(Td)  where  :=  n^iTfc 
and  pd  =  Y\dk=1  pk  for  d  =  1, . . . ,  M,  i.e.  ld  =  <S>t=i  ^ 


Lemma  4.2.6  The  error  associated  with  the  tensor  product  interpolation  operator, 
Cx,  built  on  an  admissible  index  set.  X  C  NM ,  has  the  particular  form 

M 

(I M  —  X,x)  =  ^(Rfc  (8)  I M-k) 

k=  1 
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where  the  operators,  R/c ,  are  defined  as 


E  {® 

ieXd_!  L  k= 1 


^k,ik  /■  8  (Id  £dfd-l) 


and  id  =  7rd_i(i)  +  1. 

Proof:  By  admissibility  of  X,  the  index  (i ,?m)  £  X  for  all  1  <  k  <  7TM-i(i); 

therefore,  the  following  decomposition  of  £j  is  valid 

M 

^X  =  S  ® 

iEX  k=  1 

{M— 1  'j  r  TTM-l(i) 

®  Afc>ifc  >  (8)  <  AM,d 

fc=l  J  l  d=l 

=  |  (^)  ®  AM,7rM-l(i)  (4.2.5) 

i6X,v/-i  v  fc=i  J 

Now,  plugging  the  decomposition  (4.2.5)  into  the  tensor  product  interpolation  error 
gives 

M—l 

I  I 

l  (i)  Im 


A  i 


Im  —  £x  —  Im  —  ^  S 

iexM-i  l  fc=i 

M— 1 


-  E 


Afc;ifc  >  8  Im 


i^^M-1  v  ^ — 1 

{M—l 

(££)  Afc,jfc  8  <(  Im  -  £m,ttm- i(h  ^  +  (Im-i  -  Ajm_J  8  Im 

fc=i 
M 


:  ^^(Rfc  (8)  IM-fe)  +  (Ii  —  Axi)  8)  Im-i- 


k=2 


Here,  R^  is  defined  as 


{d—l 

(££)  Afc,4 

fc=l 


(Id  £dfd-l> 


and  id  =  7T<2_i(i)  +  1.  This  proves  the  desired  result  since  Ri  =  (Ii  —  £ 


□ 
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I  will  now  use  Lemma  4.2.6  to  prove  an  upper  bound  on  the  error  of  interpolating 
functions  /  G  Cap{Y)  which  admit  an  analytic  extension  in  each  direction,  T^,  on  the 
elliptic  discs,  Drk{Yk),  using  the  tensor  product  operator,  Cj,  built  on  one  dimen¬ 
sional  Clenshaw-Curtis  interpolation  knots.  Of  course,  these  results  are  valid  only  for 
Tfc  =  [ctfc,&fc]  with  —  oo  <  Ofe  <  bk  <  oo,  for  k  —  1, ,, . . ,  M,  endowed  with  a  uniform 
distribution.  This  result  is  a  generalization  of  Lemma  3.2  in  [85]. 


Corollary  4.2.7  Suppose  f  G  L2(T)  has  an  analytic  extension  in  each  direction, 
rfc,  on  the  elliptic  discs,  Drk(Tk )■  Furthermore,  suppose  is  built  on  one  dimen¬ 
sional  Clenshaw-Curtis  interpolation  knots.  Then  the  error  associated  with  the  tensor 
product  interpolation  operator,  jCj,  satisfies  the  upper  bound 

M 

11/  -  ^x/|U°°(r)  <  ^  Rk 

k= 1 


where 


Rk :=  ^  CD*-1 


k 

and  h(i,k)  =  ^  log(r(j)2td^1, 
d=  1 


and  C,  D  are  positive  constants  defined  in  (4.1.4)  and  (4.1.5). 


Proof:  By  Lemma  4.2.6,  the  interpolation  error  satisfies 


M 


(I m  —  £i)  —  ®  f 


M-k 


k= 1 


The  principal  task  is  to  bound  the  norms  of  Rrf.  First,  notice  that  (i,  *<*)  G  A4(Id) 
for  all  i  G  Td-i-  Now, 


d- 1 


\\R<if\\co{r)  <  ^2  1  II  W^,ikf\\c°(r)  >  '  11/ -  ^dfid-J\\c°p(r) 
ield~i  l  k=  1  J 

/  d- 1  \  d- 1 

<  5^  CDd~x  Hu  ft  -  !)  exP  (  -  Z  -  l°gC rd)Td~l 


iexd_ 


.  k= 1 


fc=l 


<  n*fc)e"Mi’d)=:jRd- 

ieX(Jd) 


.  fc=i 
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This  proves  the  desired  result  since  Ri  =  (Ii— and  R\  :=  Y 
□ 


ie-M(Xi) 


C'ie-l°g(n)2i 


Corollary  4.2.7  shows  that  the  error  associated  with  the  tensor  product  interpolation 
operator,  Lx,  is  concentrated  on  the  margin  of  X.  Therefore,  in  the  adaptive  selection 
of  an  index  set,  X,  it  is  sufficient  to  control  the  error  on  the  margin  of  that  index  set, 
_A4(X).  In  this  case,  the  convergence  of  Algorithm  4.2.3  holds  when  using  the  error 
indicators 


'id—  i 


Vi  =  CDd -1!  ]  e 

.  k=  1 

for  all  indices  i  G  M.(Zd).  In  general,  this  error  bound  is  not  computable  since  one 
often  does  not  know  the  semi-axis  lengths  of  the  ellipses  in  C  for  which  a  function  has 
an  analytic  extension.  On  the  other  hand,  one  can  use  the  results  in  Corollary  4.2.7 
to  prove  usable  error  bounds  for  specific  choices  of  index  set,  X.  In  Corollary  4.2.8, 
I  restate  Theorem  3.4  from  [85].  This  result  uses  Corollary  4.2.7  to  prove  an  error 
bound  for  the  anisotropic  Smolyak  sparse  grid  based  on  the  index  set 

X  =  Xa{£,  M)  =  ji  G  Nm  :  -  IK  <  la  J  . 

To  clearly  state  this  result,  I  will  employ  the  following  notation: 

M 


,-h(i,d) 


a  :=  mm  ai,  and  A  :=  >  cu. 

k= 1 


Corollary  4.2.8  (Theorem  3.4  in  [85])  Suppose  f  G  L2(T)  has  an  analytic  exten¬ 
sion  in  each  direction,  K  on  the  elliptic  discs,  Drk(Tk )■  Furthermore,  suppose  Lx  is 
built  on  one  dimensional  Clenshaw-  Curtis  interpolation  knots  and  X  corresponds  to 
the  anisotropic  Smolyak  index  set 

{M 

i  G  Nm  :  J2(ik  -  l)a*  <  la 

k= 1 


4  r? 


with  =  log  +  sj  1  +  ) .  Then,  the  interpolation  error  associated  with  Lx 


is  bounded  by 


\\f-Lxf\\L^r)<C(a,M)ee-^ 


where 
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i  e  log(2)  a 


M,N)  := 


if  0  <  t>  <  = 

j  —  —  rv 


A 


a  l°g(2) 


P* 


otherwise 

and  the  constant  C  =  C (a,  M)  is  independent  of  the  level  i. 


Proof:  This  result  follows  from  Corollary  4.2.7  and  Lemma  3.3  in  [85]. 


□ 


The  authors  of  [85]  extend  the  error  bound  in  Corollary  4.2.8  to  be  dependent  on 
the  number  of  interpolation  knots.  In  Theorem  3.8  of  [85],  the  authors  prove  that 
under  the  conditions  of  Corollary  4.2.8,  the  interpolation  error  associated  with  Lx 
satisfy 

11/  -  £r/|U«(r)  <  C(a,  M)  Q~v  (4.2.6) 

where  v  :=  "  1"g[/\ef~1  a  whenever  0  <  i  <  The  reader  will  recall  that  the 

‘°s(^)+zJfc= i  sy  al°g\z) 

error  bound  (4.2.6)  is  exactly  the  bound  required  by  Assumption  3.2.4.  Furthermore, 
in  the  case  that  <  ^  one  sub-exponential  convergence  of  the  sparse  grid 

algorithm  (c.f.  equation  3.23  in  [85]). 

Remark  4.2.9  Similar  error  bounds  hold  for  anisotropic  Smolyak  rules  built  on 
Gaussian  interpolation  knots  (see  Theorem  3.13  in  [85]).  Moreover,  similar  error 
bounds  hold  for  isotropic  Smolyak  and  tensor  product  rules  built  on  Clenshaw- Curtis 
and  Gaussian  interpolation  knots  (see  Theorem  6.2  in  [9]  for  convergence  of  isotropic 
Smolyak  and  Theorem  f.l  in  [9]  for  convergence  of  tensor  product  rules). 


Chapter  5 


Trust  Regions  and  Adaptivity 


In  this  chapter,  I  develop  a  framework  for  the  adaptive  solution  of  optimization  gov¬ 
erned  by  PDEs  with  uncertain  coefficients.  This  framework  is  built  on  the  retro¬ 
spective  trust  region  algorithm.  I  prove  that,  with  inexact  gradient  information, 
this  modified  trust  region  algorithm  remains  globally  convergent.  In  the  trust  region 
framework,  I  use  inexact  gradient  bounds  and  a  posteriori  error  indicators  to  guide 
model  adaptation.  For  optimization  of  PDEs  with  uncertain  coefficients,  one  requires 
error  indicators  for  the  finite  element  method,  the  sparse  grids  used  in  the  stochastic 
collocation  method,  and  the  model  reduction  basis  in  the  case  of  time  dependent 
problems. 

The  trust  region  method  is  a  very  popular  and  powerful  optimization  framework 
for  solving  general  nonlinear  programming  problems  [81,  82,  94,  109,  41],  The  trust 
region  framework  offers  quite  a  bit  of  flexibility  in  exactness  of  function  evaluations 
and  gradients.  In  fact,  one  can  prove  global  convergence  of  the  trust  region  method 
when  the  function  evaluations  are  inexact  [41],  as  well  as,  when  the  gradient  eval¬ 
uations  are  inexact  [36].  This  flexibility  makes  the  trust  region  framework  an  ideal 
candidate  for  a  general  model  adaptation  framework  for  solving  PDE  constrained  op¬ 
timization  problems.  Much  work  has  gone  into  trust  region  frameworks  for  managing 
approximate  models  in  optimization  [43,  44,  3].  This  model  management  framework 
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is  the  basis  of  my  adaptive  framework. 

5.1  The  Basic  Trust  Region  Algorithm 

In  this  section,  I  will  formulate  the  basic  trust  region  algorithm  for  unconstrained 
optimization  in  a  Hilbert  space.  Let  Z  be  a  Hilbert  space.  The  basic  trust  region 
algorithm  for  solving  the  nonlinear  programming  problem 

min  J(z) 

zez 

seeks  to  compute  steps,  zk+ 1  =  zk  +  sk  G  Z,  by  solving  an  “inexpensive”  subproblem. 
In  the  classic  trust  region  theory,  inexpensive  subproblems  are  constructed  using 
second  order  Taylor  approximations  of  J(z)  centered  around  the  current  iterate,  zk. 
Taylor  approximations  are,  in  general,  accurate  in  a  small  ball  containing  the  iterate, 
Zk-  Such  a  ball  gives  rise  to  the  notion  of  a  trust  region  and  the  basic  trust  region 
subproblem  is 

min rrik(s)  subject  to  ||s||^  <  A*,  (5.1.1) 

s£Z 

for  some  inexpensive  model,  mk  :  Z  — >  M  and  mk(s )  ~  J(zk  +  s),  and  some  positive 
scalar,  A*.  >  0.  The  steps  computed  by  solving  the  subproblem  need  not  be  exact 
minimizers.  In  general,  (5.1.1)  need  not  have  a  solution  since  the  set 

13k  ■—  {s  G  Z  :  ||s||.2  A  A^} 

may  not  be  compact  for  a  general  Hilbert  space,  Z  (c.f.  page  275  in  [41]).  The  trust 
region  theory  only  requires  the  solution  of  the  subproblem  to  satisfy  the  fraction  of 
Cauchy  decrease  condition 

mk{ 0)  -mk(sk)  >  Ko||V/nfc(0)||zmin  |Afc,  (5.1.2) 

where  kq  G  (0,1)  and  (3k  =  1  +  supsgjBfe  || 3J2mk(s)\\c(z,z)-  The  fraction  of  Cauchy 
decrease  condition  (5.1.2)  ensures  that  the  decrease  in  the  modeled  objective  function, 
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rrik(s),  corresponding  to  the  approximate  solution  of  (5.1.1)  is  at  least  as  large  as  the 
decreases  of  mk(s)  corresponding  to  the  minimizer  of  (5.1.1)  in  the  negative  gradient 
(— Vmfc(0))  direction,  i.e.  the  Cauchy  point  The  basic  trust  region  algorithm  is  stated 
in  Algorithm  5.1.1. 


Algorithm  5.1.1  -  Basic  Trust  Region  Algorithm: 

1.  Initialization:  Given  mk,  zk,  Ak,  71  <  1  <  72,  and  0  <  rp  <  772  <  1. 

2.  Step  Computation:  Approximate  a  solution,  Sk,  to  the  trust  region  sub¬ 
problem  (5.1.1)  which  satisfies  condition  (5.1.2). 

3.  Step  Acceptance:  Compute  the  ratio  pk  =  where 

predfc  =  rrifc(O)  -  mk(sk)  and  ared*  =  J(zk)  -  J(zk  +  sk). 

if  Pk  >  Vi  then  zk+1  =  zk  +  sk  else  zk+i  =  zk  end  if 

4-  Trust  Region  Radius  Update: 

ifPk<V  1  then  Ak+1  e  (0,  Till-Sfell^]  end  if 

ifpk€(h cm)  then  Afc+i  e  [7i||sfe||2,  Afc]  end  if 

if  Pk  >  V2  then  Ak+1  e  [Afc,72Afc]  end  if 

5.  Model  Update:  Choose  a  new  model  mk+i(s) . 


The  standard  assumptions  on  the  objective  function,  J(z),  and  the  successive  ap¬ 
proximations  of  the  objective  function,  mk(s),  which  guarantee  first  order  convergence 
of  Algorithm  5.1.1  are 

Assumptions  5.1.2  -  Basic  Trust  Region: 

•  J  is  twice  continuously  Frechet  differentiable  and  bounded  below; 
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•  mk  is  twice  continuously  Frechet  differentiable  for  k  —  1,2,..  .; 

•  There  exists  >  0  such  that  || V2 J(z)\\c(z,z)  <  k i  for  all  z  G  Z; 

•  There  exists  k2  >  1  swc/i  that  \\V2mk(s)\\c(z,z)  <  tc2  —  1  for  all  z  £  Z  and  for 
all  k  =  1,  2, . .  .; 

•  There  exists  £  >  0  independent  of  k  such  that 

II Vmfc(O)  -  VJ(^fc)||z  <  ^min{||Vmfc(0)||2,  Afc}  (5.1.3) 

for  k  =  1,2,.... 

The  inexact  gradient  condition,  (5.1.3),  is  due  to  Heinkenschloss  and  Vicente  [60]. 
This  condition  is  slightly  stronger  than  the  classic  condition  due  to  Carter  [36] 

II Vmjt(O)  -  V J (zf) \\z  <  ^||Vmfc(0)||2 

with  f  <  1  —  772-  The  downside  of  this  gradient  condition  is  that  £  is  required  to  be 
less  than  1.  In  many  practical  applications,  one  does  not  know  exactly  the  scaling 
coefficients  in  front  of  their  error  bounds.  Thus,  (5.1.3)  is  preferable  to  compute 
with  since  one  does  not  need  to  know  £  exactly.  As  stated,  these  assumptions  are 
sufficient  to  show  that  Algorithm  5.1.1  converges  to  a  first  order  stationary  point  (c.f. 
see  [81,  41]). 

5.2  The  Retrospective  Trust  Region 

In  Algorithm  5.1.1,  the  trust  region  radius,  A*,,  is  always  updated  according  to  the 

current  model,  mfc(s);  hence,  the  new  trust  region  radius,  Afc+i,  may  be  insufficient  to 

handle  the  new  model,  m*;+i(s).  Bastin,  et  al.  created  the  retrospective  trust  region 

algorithm  to  circumvent  this  possible  pitfall  [17].  In  the  retrospective  framework, 

steps  are  accepted  according  to  the  performance  index 

_  aredfc  _  J(zk)  -  J(zk  +  sk) 

Pk  predfc  mk{ 0)  -  mk(sk ) 
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as  in  Algorithm  5.1.1,  but  the  trust  region  radius  is  updated  retrospectively  according 

to  the  new  model.  Once  a  step  is  accepted  according  to  pk,  the  model  is  updated  to 

mk+i(s)  and  one  computes  the  retrospective  performance  index 

~  J{%k)  J {z k  T  Sk ) 

Pk  mk+i(0)  -  mk+1(sk)' 

Using  this  index,  pk,  the  trust  region  radius  is  fit  to  the  new  model,  mk+i(s).  As 
with  Algorithm  5.1.1,  one  need  not  solve  the  trust  region  subproblem  exactly.  The 
approximate  solution  to  the  trust  region  subproblem  must  satisfy  the  fraction  of 
Cauchy  decrease  condition  (5.1.2).  The  retrospective  trust  region  algorithm  is  stated 
in  Algorithm  5.2.1. 

Algorithm  5.2.1  -  Retrospective  Trust  Region  Algorithm: 

1.  Initialization:  Given  mk,  zk,  Ak,  0  <  71  <  72  <  1,  Amax  >  0,  0  <  770  <  1, 
and  0  <  rji  <  r/2  <  1. 

2.  Step  Computation:  Approximate  a  solution,  sk,  to  the  trust  region  sub¬ 
problem  (5.1.1)  which  satisfies  (5.1.2). 

3.  Step  Acceptance:  Compute  the  ratio  pk  =  where 

predfc  =  mk( 0)  -  mk(sk)  and  aredfc  =  J(zk)  -  J(zk  +  sk). 

if  Pk  >  Vo  then  zk+1  =  zk  +  sk  else  zk+1  =  zk  end  if 

f.  Model  Update:  Choose  a  new  model,  mk+ 

5.  Trust  Region  Radius  Update: 

if  zk+1  =  zk  then  Ak+l  e  (0, 7i||sfc||^] 
else  define 

_  J(zk+ 1) 

Pk+1  mfc+i(0)  -  mk+1(-sk) 
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and  update  Afc+1 

by 

if  pk+i  <  Vi 

then 

Afc+i  G  (0, 72 1  SfclU] 

end  if 

if  pk+ i  e 

then 

Afc+i  G  [72 1  5fc|  Afc] 

end  if 

if  Pk+i  >  V2 

then 

Afc+i  G  [Ak,  Amax] 

end  if 

The  assumptions  I  will  make  on  the  objective  function,  J(z),  and  the  successive 
approximations  to  the  objective  function,  mk(s),  to  guarantee  first  order  convergence 
are 

Assumptions  5.2.2  -  Retrospective  Trust  Region: 

•  J  is  twice  continuously  Frechet  differentiable  and  bounded  below; 

•  mk  is  twice  continuously  Frechet  differentiable  for  k  —  1,2,..  .; 

•  There  exists  K\  >  0  such  that  \\T72 J(z)\\c{z,z)  <  «u  for  all  z  G  Z; 

•  There  exists  k2  >  1  such  that  \\'V2mk(s)\\c(z,z)  <  k2  —  1  for  all  s  €  Z  and  for 
all  k  =  1,  2, . .  .; 

•  There  exists  f  >  0  independent  of  k  such  that 

||Vmfe(0)  -  VJ(zk)\\z  <  ^min{||Vmfe(0)||2:,  Afc_i}  (5.2.1) 

for  k  =  1,2,. . .. 

These  assumptions  are  relaxed  from  the  assumptions  made  in  the  original  retrospec¬ 
tive  trust  region  work  [17].  Namely,  Bastin,  et  al.  assume  that  the  model  at  s  —  0, 
mfc(0),  and  its  gradient  at  s  =  0,  Vm^O),  are  equal  to  the  objective  function  at  z^, 
J(zk),  and  its  gradient  at  zk,  'VJ(zk),  respectively.  I  will  prove  later  that  my  weaker 
assumptions  are  sufficient  to  prove  that  the  retrospective  trust  region  algorithm  con¬ 
verges  to  a  first  order  critical  point. 
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Remark  5.2.3  As  seen  in  Algorithm  5.2.2,  each  successful  step  requires  two  new 
model  evaluations  to  compute  pk+ 1  (i.e.  mfc+i(0)  and  mk+i(—Sk)).  For  the  optimiza¬ 
tion  problems  considered  in  this  thesis,  additional  model  evaluations  are  very  expen¬ 
sive  as  they  require  the  stochastic  collocation  solution  to  the  state  equation.  Hence, 
for  some  classes  of  optimization  problems  it  is  unclear  whether  or  not  the  additional 
computational  cost  of  the  retrospective  trust  region  algorithm  is  worth  while. 

5.2.1  Discussion  of  Stopping  Criterion 

As  a  stopping  criterion,  I  suggest  using  a  model  gradient  stopping  test  and  a  step  size 
stopping  test.  First  of  all,  if  the  computed  step,  Sk  is  “sufficiently”  small  (sufficient 
here  depends  on  the  scaling  of  the  problem),  then  the  trust  region  algorithm  is  not 
making  significant  progress  and  should  be  terminated.  Given  a  step  tolerance  stol  >  0, 
this  condition  reads 

||Sfc||.Z  <  stol. 

On  the  other  hand,  if  the  modeled  gradient,  Vm^O),  is  “sufficiently”  small  (again 
sufficient  depends  on  the  scaling  of  the  problem),  then  the  algorithm  should  be  ter¬ 
minated.  Moreover,  the  inexact  gradient  conditions  (5.1.3)  implies  that 

||VJ(zfc)IU  <  ||VJ(zfc)  -Vmfc(0)|U  +  ||Vmfc(0)|U  <  (1  +  Oil Vmfc(0)||2. 

Thus,  if  gtol  >  0  and  ||  Vmfc(0)||.z  <  gtol,  then  it  is  clear  that 

l|VJ(zfe)IU  <  (i  +  Ogtoi. 

5.2.2  Convergence  of  the  Retrospective  Trust  Region 

In  this  section,  I  prove  that  under  Assumptions  5.2.2,  the  Algorithm  5.2.1  converges 
to  a  first  order  critical  point.  First,  I  prove  that  the  sequence  of  trust  region  radii 
must  converge  to  zero  if  the  norm  of  the  gradients  are  bounded  away  from  zero. 
Secondly,  I  show  that  under  these  assumptions,  pk  and  pk+i  converge  to  one.  Finally, 
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these  results  are  combined  to  prove  that  Algorithm  5.2.2  converges  to  a  first  order 
critical  point.  Most  results  presented  here  follow  the  standard  convergence  proof  for 
the  basic  trust  region  algorithm  provided  in  Theorem  4.10  in  [81],  although  care  must 
be  taken  to  handle  the  retrospective  trust  region  update. 

Lemma  5.2.4  Suppose  there  exists  e  >  0  such  that  ||Vmfc(0)||.2:  >  e  for  k  sufficiently 
large.  Then  the  sequence  of  trust  region  radii,  {Afc},  produced  by  Algorithm  5.2.1 
satisfies 

OO 

VI A k  <  oo- 

k= 1 

Proof:  First  notice  that  the  result  of  the  theorem  holds  if  there  is  only  a  finite 

number  of  successful  iterations  because  for  sufficiently  large  k,  A^+i  <  yiA*,.  Now, 
if  there  is  an  infinite  sequence  of  successful  iterations  {ki}  then  for  sufficiently  large 
i  the  fraction  of  Cauchy  decrease  condition  (5.1.2)  implies 

J(%k+ 1) 

>?7o(mfc(0)  -  mk(sk )) 

>'/7oKo||V/nfc(0)||^min  ^ Afc, 

>r]0n0Ake. 

This  implies  that  <  oo.  Furthermore,  for  every  unsuccessful  iteration 

k  ^  { ki },  the  trust  region  radius  satisfies  Ak  <  ryk  kj  Ak.  where  kj  E  {ki}  is  the 
largest  index  such  that  kj  <  k.  The  convergence  of  geometric  series  and  the  above 
result  imply  that 

^  OO  OO  /  1  \  oo 

and  1  +  EA^Ao°- 

k£{ki}  n  i= 1  k= 1  \  'V  i= 1 


This  proves  the  desired  result. 


□ 


96 


Lemma  5.2.4  will  be  used  to  arrive  at  a  contradiction.  To  obtain  this  contradiction, 
I  first  must  show  that  for  k  sufficiently  large,  Algorithm  5.2.1  produces  a  successful 
step. 


Lemma  5.2.5  Suppose  there  exists  e  >  0  such  that  ||  Vrrifc(0)||^  >  e  for  k  sufficiently 
large.  Then,  under  Assumptions  5.2.2,  the  ratios,  {pk},  converge  to  one. 


Proof:  By  Taylor’s  theorem,  there  exists  Ok  and  rjk  on  the  line  segment  between 
s  =  0  and  s  =  Sk  such  that 


aredfc  =  ( VJ(zk),sk)z  +  ~(V2J(9k)sk,  sk)z 
predfc  =  (Wmk(0),sk)z  +  ^(V2mk(pk)sk,  sk)Z- 


These  expansions  and  Assumptions  5.2.2  imply 

X 

|aredfc  -  predfc|  <  £Afc_iAfc  +  -(kx  +  k2  -  l)A2k. 


Furthermore,  the  fraction  of  Cauchy  decrease  condition,  (5.1.2),  and  the  assumption 
that  ||Vmfc(0)||.2:  >  e  imply  that  for  sufficiently  large  k, 

predfc  >  k0\\  Vrafc(0)||.z  min  |Afe,  >  KoeAk. 


Combining  these  inequalities  gives 


I  Pk  -  1|  <  efc 


^Afc_i  +  \(ki  +  ^2  —  l)Afc 

KqC 


for  sufficiently  large  k.  The  sequence  {e*,}  converges  to  zero  by  Lemma  5.2.4,  there¬ 
fore  proving  the  result.  □ 


In  addition  to  achieving  a  successful  step,  I  must  also  prove  that  Algorithm  5.2.1 
increases  the  trust  region  radius.  This  result  is  proved  in  a  similar  fashion  to  Lemma  5.2.5. 

Lemma  5.2.6  Suppose  there  exists  e  >  0  such  that  ||  Vmfc(0)||.z  >  e  for  k  sufficiently 
large.  Then,  under  Assumptions  5.2.2,  the  ratios,  {pk+ 1},  converge  to  one. 
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Proof:  By  Taylor’s  theorem,  there  exists  ijk+i  on  the  line  segment  between  s  =  0 
and  s  =  —sk  such  that 


mk+i(-sk)  -  mk+ 1(0)  =  -(Vmk+i(0),sk)z  +  -(V2mk+i(r]k+1)sk,  sk)z. 

This  equality,  the  expansion  of  predA.  in  the  proof  of  Lemma  5.2.5,  Assumptions  5.2.2, 
and  the  reverse  triangle  inequality  imply 

|predfc|  -  \(mk+i(—sk)  -  mfc+i(0))|  <  ||Vmfc+i(0)  -  Vmk(0)\\zAk  +  {k2  -  1)A|. 

To  bound  this  further,  notice  that 

||Vmfc+i(0)  -  Vmk(ti)\\z  <||Vmfc+i(0)  -  VJ(zk  +  sk) \\z  +  ||VJ(z*  +  sk)  -  VJ(zk) \\z 

+  ||VJ(^)-Vmfe(0)|U.  (5.2.2) 

The  hrst  and  third  expressions  on  the  right  hand  side  of  (5.2.2)  are  bounded  using 
(5.2.1),  and  the  second  expression  is  bounded  using  the  differentiability  of  J,  i.e. 


||VJ(zfc  +  sk)  -  VJ{zk)\\z  =  ||  /  V2J(^fc  +  tsk)skdt 

hi 


<  «iA k. 


This  proves  that 


I predfc |  -  \mk+i(-sk)  -  mfe+i(0)|  <  (£Afc  +  £Afc_i  +  mAk)Ak  +  (k2  ~  1)A 


which  implies  the  lower  bound 


2 


\mk+i(-sk)  -  mfc+i(0)|  >  | mk(sk)  -  mk( 0)|  -  ikAk 

with  ek  =  (£A k  +  £Afc__i  +  KiAk  +  (k2  —  l)Afc).  The  fraction  of  Cauchy  decrease 
condition  and  the  assumption  that  ||Vmfe(0)||2  >  e  imply 

\mk+i(-sk)  -  mk+ 1(0)|  >  (kq€  -  h) Ak. 


Since  ek  converges  to  zero  by  Lemma  5.2.4,  the  right  hand  side  of  the  above  inequality 
is  non-negative  for  sufficiently  large  k.  Following  the  proof  of  Lemma  5.2.5,  these 
bounds  imply 


Pk+i  —  1  A 


Afc(£  +  +  k-2  —  1)) 


0  as  k  — >  oo. 


—  £k 
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This  proves  the  desired  result. 


□ 


Combining  these  results  gives  the  desired  result  of  this  section;  namely,  these 
results  lead  to  a  proof  that  Algorithm  5.2.1  converges  to  a  first  order  critical  point. 

Theorem  5.2.7  Suppose  Assumptions  5.2.2  hold,  then 

lirninf  ||Vmfc(0)||2  =  liminf  \\S7J(zk)\\z  =  0. 

fc— XX)  fc— XX) 

Proof:  For  contradiction,  suppose  there  exists  e  >  0  such  that  ||  V-m.fc(0) ||^  >  e.  By 
Lemma  5.2.5,  for  k  sufficiently  large,  there  is  a  successful  step,  sk  since  pk  converges 
to  one.  By  Lemma  5.2.6,  for  k  sufficiently  large,  the  trust  region  radius  must  be  in¬ 
creased  since  pk+i  converges  to  one.  This  fact  contradicts  the  result  of  Lemma  5.2.4.  □ 


5.3  A  Framework  for  Model  Adaptivity 

Algorithms  5.1.1  and  5.2.1  along  with  gradient  conditions  (5.1.3)  and  (5.2.1),  re¬ 
spectively,  give  natural  frameworks  for  model  adaptivity.  As  long  as  the  modeled 
gradients  satisfy  the  gradient  error  bounds,  (5.1.3)  and  (5.2.1),  Algorithms  5.1.1  and 
5.2.1,  respectively,  are  globally  convergent.  Furthermore,  in  many  cases  it  is  impos¬ 
sible  (or  at  least  computationally  infeasible)  to  compute  V  J(z).  For  such  problems, 
the  bounds  (5.1.3)  and  (5.2.1)  ensure  global  convergence  with  out  the  necessity  of 
computing  \7.J(z)  as  long  as  there  exists  an  a  posteriori  error  indicator,  p,  such  that 

II V J (zk)  -  Vmk(0)\\z  <  Cp. 

Such  error  indicators  exist  when  the  model  nik(s)  corresponds  to  a  Galerkin  approxi¬ 
mation  for  PDE  discretization  or  a  stochastic  Galerkin  approximation  for  uncertainty 
quantification.  These  indicators  come  in  many  forms  such  as  residual  based  and  ad¬ 
joint  based  error  indicators.  When  m,k(s)  denotes  a  stochastic  collocation  approxi¬ 
mation  (i.e.  mk(s )  =  a(jCxkj(u(z),  z))  for  some  tensor  product  interpolation  operator, 
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Cxk,  built  on  an  admissible  index  set  Ifi)  I  approximate  the  error  indicator,  rj,  as  the 
contribution  to  the  gradient  of  mk(s)  associated  with  the  indices  on  the  margin  of 
the  index  set,  M.(Ifi).  This  can  be  accomplished  by  employing  Algorithm  4.2.3,  in 
which  case  the  margin  is  denoted  as  the  active  set,  A ,  and  the  error  indicator  is 

V  := 

i  eA 

In  general,  this  error  indicator  is  heuristic,  but  seems  to  work  well  in  practice.  Al¬ 
though  convergence  of  the  trust  region  algorithms  cannot  be  guaranteed  in  any  rigor¬ 
ous  manner,  Corollary  4.2.7  suggests  that  this  choice  of  error  indicator  may,  in  fact, 
be  sufficient. 

Now,  for  simplicity,  consider  the  test  problem  presented  in  Section  2.1  with  risk 
measure  <j(Y)  =  E[Y]  and  Algorithm  5.2.1.  Denote  the  objective  function  based 
on  the  index  set  Ip.  as  Jk(.z)  and  define  the  collocation  approximate  model  centered 
around  the  iterate  zk  as  mk(s)  =  Jk{zk  +  s).  The  model  gradient  is 

Vmfe(s)  =  YJk(z)  =  az  +  o  jCXk[B*p(z)\ 

=  az  +  R  1  E  o  AiB *p(zk) 
i  elk 

where  p(z)  =  p  solves  the  adjoint  equation  (2.3.2).  The  loop  for  satisfying  the  bound 
(5.2.1)  is  presented  in  Algorithm  5.3.1. 


Algorithm  5.3.1  -  Model  Adaptation: 

Let  Ik  =  Ik-i  and  A  =  M.(Ik) 

Set  rji  =  || E  o  AiB*p(zk)\\z  and  p  =  Y.ieA  7li 
Compute  Vfflt(O)  =  otzk  +  R_1  E  o  A;B *p(zk) 
Define  TOL  =  £min  j  ||  Vmfe(0)||2,  Afc_i| 

while  r)  >  TOL  do 

Select  i  G  A  corresponding  to  the  largest  rji 
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Set  A  <—  A  \  {i}  and  Xk  <—  XkU  {i} 

Update  the  error  indicator  p  =  p  —  pi 
for  i  =  1, . . . ,  d  do 
Set  j  =  i  +  e( 

iflk  U  {j}  is  admissible  then 
Set  A  <—  A  U  {j} 

Set  rjj  =  || E  o  AjB*p(zk)\\z 

Update  the  gradient  Vmk( 0)  V/nfc(0)  +  o  AjB*p(zfc) 

Update  the  error  indicator  p  <—  p  +  r/j 

Update  the  tolerance  TOL  =  £  min  1 1|  V'?TZfc(0)  ||^r,  Afc_i| 

end  if 
end  for 
end  while 


Algorithm  5.3.1  describes  the  application  of  adaptive  sparse  grid  stochastic  collo¬ 
cation.  In  particular,  Algorithm  5.3.1  employs  dimension  adaptive  sparse  grids.  Other 
forms  of  sparse  grid  adaptivity  exist  and  can  be  similarly  applied.  Some  other  forms 
of  adaptive  sparse  grids  are  domain  adaptive  sparse  grids  which  refer  to  partition¬ 
ing  the  domain  and  placing  sparse  grids  in  each  sub-domain  [1]  and  locally  adaptive 
sparse  grids  which  refers  to  the  use  of  continuous  piecewise  linear  interpolation  as  a 
basis  for  the  sparse  grids.  In  case  of  local  adaptation,  one  can  add  points  locally  and 
maintain  the  desirable  properties  of  sparse  grids  [73]. 

In  general,  there  may  be  multiple  discretizations  necessary  to  approximately  solve 
the  state  equation  (2.2.4).  In  the  case  that  e(w,  z\  y )  denotes  a  nonlinear  steady  PDE, 
the  stochastic  collocation  results  in  Q  nonlinear  PDEs  to  be  solved,  i.e.  e(uk,  z]  yk )  = 
0.  The  finite  element  method  (FEM)  is  a  common  approach  to  discretizing  e(uk,  z\  yk ) 
in  space.  Let  Vh  ■—  span{0i, . . . ,  (j)^}  C  V  be  a  finite  dimensional  subspace  of 
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the  deterministic  state  space  V  and  let  :=  spanj^i, . . .  ,if>N2}  C  W*  be  a  finite 
dimensional  subspace  of  W*.  The  Petrov-Galerkin  finite  element  discretization  of  the 
kth  state  equation  is 

(  e  (  Y]  uk,n(j)n ,  z;  yk)  )  =0  Vm  =  1, . . .  ,-/V2  (5.3.1) 

\  n=  1  /  W*,W 

and  one  solves  for  the  vector  tq,  :=  (wfcjl, . . .  , 'itfc,jv1)T  G  RNl.  When  coupling  FEM 
with  stochastic  collocation,  there  is  a  natural  error  splitting.  Let  UQh  =  UQh(z)  G 
L2p(T]  Vh)  denote  the  stochastic  collocation  solution  computed  by  solving  (5.3.1),  then 
the  error  in  the  solution  to  the  state  equation  can  be  bounded  by 

IK*)  -  uQh{z)\\u  <  II u(z)  -  uQ(z)\\u  +  II Uq(z)  -  uQh(z)\\u- 

The  first  term  of  this  bound  is  controlled  by  interpolation,  while  the  second  term  is 
controlled  by  controlling  the  finite  element  error.  Many  approaches  exist  to  adaptively 
control  the  error  associated  with  FEM.  One  popular  method  is  to  use  residual  based 
error  indicators.  These  indicators  are  computed  using  the  residual 

N\ 

Rk  ^  ^  ^  ^  Vk^j 

n=  1 

[35].  Other  possible  methods  are  averaging  techniques  [33]  and  adjoint  based,  goal 
oriented  error  indicators  [61].  Moreover,  the  authors  of  [127]  employ  these  FEM  error 
indicators  in  the  context  of  PDE  constrained  optimization  using  the  trust  region 
framework. 

Similar  observations  and  error  control  can  be  performed  for  the  solution  to  the 
adjoint  equation.  Recall  that  the  gradient  conditions  (5.1.3)  and  (5.2.1),  and  the  spe¬ 
cific  form  of  the  gradient  place  much  emphasis  on  controlling  the  error  in  the  adjoint 
state.  Care  must  be  taken  when  controlling  the  adjoint  error  because  the  adjoint 
directly  depends  on  the  solution  to  the  state  equation.  When  solely  considering  the 
stochastic  collocation  discretization,  the  state  error  is  automatically  controlled  due  to 
the  interpolation  properties  of  Cj,  i.e.  pk  =  Pk(uk(z))  depends  on  Uk  =  Uk(z)  which  is 
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exactly  the  solution  to  e(uk,  z;  yk)  =  0  for  k  =  1, . . . ,  Q.  When  considering  other  dis¬ 
cretization  techniques,  the  adjoint  error  control  must  also  contain  state  error  control. 
Typically,  error  bounds  for  the  FEM  discretization  are 

II p(u(z))  -Ph(uh(z)) ||W»  <Ci\\u(z)  -uh(z)\\u 

+  c2\\p(uh(z)  -ph(uh(z)) llw* 

where  p  =  p(u(z))  E  W*  is  the  solution  to  the  infinite  dimensional  adjoint  equation 
(2.3.2)  and  p  =  p(uh(z))  E  W*  is  the  solution  to  the  infinite  dimensional  adjoint 
equation  (2.3.2)  with  u(z)  replaced  by  Uh(z). 

With  the  above  discussion  in  mind,  one  can  incorporate  both  the  finite  element 
and  collocation  error  in  the  adaptive  loop  5.3.1  by  performing  the  error  splitting 

IIVJM  -  vjQh(z) iu  <  IIV j(z)  -  vjQ(z) IU  +  IIVJoW  -  vjQh(z)\\z 

where  Jq{z )  denotes  the  collocation  discretization  of  the  objective  function  and  Jqh(z) 
denotes  the  collocation  and  finite  element  discretized  objective  function.  Note,  the 
first  term  only  deals  with  collocation  error  and  the  second  term  only  deals  with  finite 
element  error.  Now,  suppose  one  is  considering  adding  an  index,  k,  to  the  current 
generalized  sparse  grid  index  set,  X.  If  XU{k}  remains  admissible,  then  one  can  derive 
the  error  indicator  r/k  (again,  note  that  this  is  only  collocation  discretized  and,  thus, 
not  necessarily  computable).  If  one  defines  a  similar  error  indicator  for  the  stochastic 
collocation  finite  element  discretized  problem,  rj then  one  can  approximate  ||^k||^ 
as 

\Mz<\\v±-vt\\z  +  U\\z. 

Here,  the  first  term  is  controlled  by  the  finite  element  error.  Controlling  the  error  in 
the  first  term  so  that 

WVk-r&Wz  <  c||^IU, 

where  c  >  0  is  fixed,  will  give  the  error  bound 


IMU<(c  +  iM|U 
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which  depends  only  on  the  computable  quantity,  ||/7k||.z- 

In  the  case  of  time  dependent  PDE  operators  e(u,  z\  y ),  one  can  consider  adaptive 
time  stepping  and  adaptive  basis  selection  for  model  order  reduction.  In  [77]  and 
[78,  79],  the  authors  perform  adaptive  time  stepping  for  the  optimal  control  of  de¬ 
terministic  parabolic  PDEs.  In  these  works,  the  authors  incorporate  error  control  for 
both  finite  element  and  time  stepping  discretization.  On  the  other  hand,  the  authors 
of  [48]  employ  the  trust  region  framework  to  adaptive  build  reduced  order  models  for 
parabolic  control  problems.  In  this  work,  the  reduced  order  models  are  built  using 
proper  orthogonal  decomposition.  In  general,  the  trust  region  algorithms  are  flexible 
enough  to  handle  any  sort  of  model  adaptivity  so  long  as  the  gradient  conditions 
(5.1.3)  and  (5.2.1)  are  satisfied.  Hence,  it  is  possible  to  incorporate  these  additional 
adaptive  strategies  using  Algorithm  5.3.1  and  similar  arguments  as  above. 


Chapter  6 


Implementation  Details 


This  chapter  is  dedicated  to  considerations  for  a  high  performance  implementation 
of  the  adaptive  stochastic  collocation  and  trust  region  framework  described  in  this 
thesis.  When  discretized,  the  optimization  problems  discussed  here  are  extremely  high 
dimensional  and  computationally  intensive.  The  numerical  solution  of  these  problems 
is  challenging  to  say  the  least.  When  implementing  the  methods  discussed  in  this 
thesis,  one  must  exploit  the  natural  parallelism  of  the  stochastic  collocation  method. 
Furthermore,  one  must  use  efficient  large-scale  nonlinear  programming  techniques  in 
the  implementation  of  Algorithms  5.1.1  and  5.2.1.  I  will  first  discuss  implementation 
of  a  high  fidelity  discretization  of  the  objective  function  and  ways  to  exploit  multiple 
forms  of  parallelism  in  function  evaluations  and  derivative  computations.  1  will  then 
present  nonlinear  programming  ideas  to  accelerate  the  trust  region  approach. 

6.1  High  Fidelity  Objective  Computation 

The  implementation  of  Algorithms  5.1.1,  5.2.1,  and  5.3.1  require  careful  consider¬ 
ation.  There  are  many  implementation  details  that  can  significantly  improve  the 
performance  of  these  algorithms.  When  using  either  trust  region  algorithm,  one  must 
compute  the  ratio  between  actual  and  predicted  decrease,  pk .  Each  evaluation  of  this 
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ratio  requires  the  computation  of  the  objective  function,  J(z).  In  general,  the  objec¬ 
tive  function  need  not  be  discretized  if  one  is  able  to  evaluate  the  infinite  dimensional 
function  J(z).  This  is  almost  never  the  case.  Since  this  thesis  is  concerned  with  dis¬ 
cretization  of  the  stochastic  space,  I  will  discretize  J(z)  using  stochastic  collocation. 
This  discretization  must  be  high  fidelity  in  order  for  adaptivity  to  be  meaningful. 
Thus,  for  this  high  fidelity  discretization,  I  will  use  isotropic  Smolyak  sparse  grids  of 
high  level  (c.f.  see  the  definition  in  Remark  4.2.2).  Recall  here  that  Corollary  3.4.2  and 
Theorem  3.4.4  prove  convergence  for  such  discretizations.  Using  isotropic  Smolyak 
sparse  grids  for  the  high  fidelity  discretization  is  advantageous  because  these  inter¬ 
polation  operators  require  much  fewer  interpolation  knots  as  opposed  to  full  tensor 
product  interpolation.  Furthermore,  a  priori  knowledge  of  the  anisotropy  in  the  opti¬ 
mization  problem  is  typically  not  available.  Without  this  knowledge,  one  cannot  build 
meaningful  anisotropic  sparse  grids.  As  a  consequence  of  the  adaptive  loop  (5.3.1), 
the  resulting  adapted  sparse  grid  contains  information  concerning  the  anisotropy  as¬ 
sociated  with  the  optimization  problem.  This  anisotropy  can  be  visualized  using  the 
final  index  set,  X,  and  can  be  used  to  help  characterize  the  importance  of  the  random 
variables  in  T. 

Although  isotropic  Smolyak  sparse  grids  provide  a  reduction  in  computational 
cost  when  compared  to  full  tensor  product  grids,  they  still  pose  a  possibly  enormous 
computational  burden.  Adaptive  collocation  allows  for  relatively  cheap  step  compu¬ 
tation  in  the  trust  region  framework,  but  this  may  be  outweighed  by  the  high  fidelity 
function  evaluations  required  at  each  iteration.  Since  my  adaptive  framework  uses 
low  order  adapted  sparse  grids  to  compute  gradients  and  Hessian  information,  there  is 
a  clear  advantage  when  compared  to  Newton-CG  applied  to  the  high  fidelity  problem 
as  long  as  many  trust  region  iterations  are  not  required.  Note  that  in  the  adaptive 
framework,  if  the  knots  associated  with  the  adapted  sparse  grids  are  subsets  of  those 
associated  with  the  high  fidelity  sparse  grid  (i.e.  nested  knots),  one  can  re-use  state 
computations.  Further  efficiency  can  be  achieved  by  exploiting  the  almost  trivial 
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parallelism  of  the  stochastic  collocation  method. 

6.2  Parallel  Collocation  and  Linear  Algebra 

The  stochastic  collocation  method  described  in  this  thesis  lends  itself  naturally  to 
parallel  implementation.  The  stochastic  collocation  method  for  the  solution  of  (2.2.4) 
requires  the  decoupled  solutions  of 

e(uk,z;yk)  =  0  V  k  =  1,...,Q 

for  fixed  z  E  Z.  Additionally,  in  the  optimization  context,  one  must  solve  the  adjoint 
equation  (2.3.2) 

eu(uk,  z ;  yk)k  +  jv(uk,  z)  =  0  V  k  =  1, . . . ,  Q. 

The  solution  to  the  adjoint  equation  at  the  kth  collocation  point,  Xk,  depends  on  the 
solution  to  the  state  equation  at  the  kth  collocation  point,  uk.  Therefore,  the  state  and 
adjoint  equations  must  be  solved  in  serial.  One  can  solve  the  state  and  adjoint  equa¬ 
tions  at  different  collocation  points  concurrently.  Now,  consider  the  approximation 
operator,  Cq ,  with  polynomial  representation 

Q 

(£Qu)(y)  =  ^Pk{y)uk. 

k=  1 

The  parallel  function  evaluation  and  gradient  computation  algorithm  is  listed  in  Al¬ 
gorithm  6.2.1. 


Algorithm  6.2.1  -  Parallel  Function  Evaluation  and  Gradient  Compu¬ 
tation: 

Given  z  e  Z  and  {yk}k=1  C  T,- 

for  k  —  1, . . . ,  Q  do 
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Compute  Uk  which  solves  e(uk,z)yk )  =  0; 

Compute  jk  =  j(uk,  z); 

Compute  Afc  which  solves  eu(uk,z;yk)* \k  +  ju(uk,z)  =  0. 

end  for 

Compute  JQ(z)  =  a  (  Y2=\  pk{y)jk)  / 

Compute  the  derivative  of  Jq(z)  as 

Q 

Jq(z )  =  y^/dk{ez(uk(z),  z;yk)* \k  +  jz(uk(z),  z)  j 

k= 1 


where  'Ok 


E 


a 


E?=i  p?(y)j(Mz),z))pk(y) 


Algorithm  6.2.1  demonstrates  that  the  state  and  adjoint  computations  can  all  be 
performed  in  parallel  on  multiple  processors,  but  each  process  must  communicate  to 
compute  objective  functions  and  gradients.  The  application  of  the  Hessian  of  Jq(z) 
to  a  vector,  v  E  Z,  exhibits  the  same  structure  and  parallelism  as  Algorithm  6.2.1. 
For  more  information  on  the  computation  of  second  order  information  see  [67]. 

The  parallelism  of  the  stochastic  collocation  discretization  is  essential  for  high 
dimensional  stochastic  spaces,  T.  In  general,  the  total  number  of  collocation  points, 
Q,  is  extremely  large  and  the  only  feasible  approach  to  solving  these  state  and  ad¬ 
joint  equations  is  to  exploit  this  parallelism.  Aside  from  the  immense  dimension  of 
the  collocation  space,  it  is  often  the  case  that  the  solution  to  the  state  and  adjoint 
equations  at  each  collocation  point  is  also  computationally  intense.  This  is  the  case 
when  e(u,  z;  y)  denotes  a  partial  differential  equation.  Common  PDE  discretization 
techniques  often  yield  extremely  high  dimensional  nonlinear  and  linear  systems  to  be 
solved  for  the  state  and  adjoint  equations,  respectively.  With  the  existence  of  many 
parallel  and  distributed  linear  algebra  tools,  it  is  common  practice  to  solve  the  state 
and  adjoint  equations  using  distributed  Krylov  subspace  methods.  Furthermore,  in 
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the  case  of  PDE  constraints,  much  work  has  gone  into  developing  parallel  precondi¬ 
tioners  for  solving  the  linearized  state  equation  and  the  adjoint  equation.  Many  such 
preconditioners  exist,  such  as  domain  decomposition  and  multi-level  preconditioners. 
Including  distributed  linear  algebra  to  Algorithm  6.2.1  gives  an  efficient,  fully  par¬ 
allel  approach  to  computing  derivatives.  The  use  of  linear  algebra  and  collocation 
parallelism  is  not  completely  straightforward  though.  Suppose  I  have  a  total  of  nproc 
processes  available  for  computation  and  suppose  I  wish  to  dedicate  nLA  processes  to 
linear  algebra.  Here,  I  assume  nLA  divides  nproc.  I  can  then  split  the  nproc  pro¬ 
cesses  into  nSC  =  groups  of  nLA  processes.  Each  of  these  nSC  groups  is  given 
a  subset  of  the  collocation  points  and  nLA  processes  to  perform  distributed  linear 
algebra  computations.  This  scheme  is  depicted  in  Figure  6.1. 
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Figure  6.1:  Depiction  of  the  communication  pattern  used  to  incorporate  distributed 
linear  algebra  and  parallel  stochastic  collocation  computations. 


6.3  Nonlinear  Programming  Considerations 

Although,  function  evaluations  and  derivative  computations  can  be  performed  exploit¬ 
ing  parallelism  in  the  linear  algebra  and  the  stochastic  collocation,  these  computations 
are  still  extremely  expensive.  For  this  reason,  it  is  crucial  to  use  efficient  nonlinear 
programming  techniques  in  the  implementation  of  Algorithms  5.1.1  and  5.2.1.  In  the 
case  that  Jq(z)  is  quadratic  (or  if  rrik(s)  are  chosen  to  be  quadratic  approximations 
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to  JQ(z)),  the  trust  region  subproblem  (5.1.1)  can  be  solved  using  the  truncated  con¬ 
jugate  gradient  (CG)  method  [41].  Truncated  CG  is  a  generalization  to  the  standard 
CG  algorithm  which  exits  if  the  current  CG  solution  is  outside  the  trust  region  or 
if  the  algorithm  finds  a  direction  of  negative  curvature.  Truncated  CG  requires  a 
function  that  applies  the  Hessian  of  rrik  to  a  vector.  As  mentioned  above,  Hessian 
times  a  vector  computations  are  similar  to  Algorithm  6.2.1  and  require  2 Q  linear  PDE 
solves.  Although  this  can  be  performed  using  the  parallelism  described  above,  if  CG 
requires  many  iterations,  the  trust  region  approach  quickly  becomes  computationally 
infeasible.  Adaptivity  helps  in  this  aspect  because  for  early  trust  region  iterations, 
the  number  of  collocation  points  is  small  and  thus,  only  a  few  linear  PDEs  must  be 
solved  at  each  CG  iteration. 

Another  method  of  reducing  the  number  of  CG  iterations  is  to  use  preconditioning. 
In  [67],  the  author  explores  using  low  order  sparse  grid  approximations  to  precondition 
the  Hessian.  This  method  works  well  for  small  problems,  but  when  the  stochastic 
dimension  is  large,  even  low  order  sparse  grids  have  many  collocation  points.  In 
this  work,  I  use  limited  memory  quasi-Newton  approximations  to  the  inverse  of  the 
Hessian  to  automatically  precondition  the  CG  iterations.  The  use  of  quasi-Newton 
preconditioners  was  presented  in  [80]  in  the  general  context  of  nonlinear  programming 
and  in  [23]  in  the  context  of  flow  control.  The  results  presented  in  [80]  and  [23]  appear 
to  be  inconclusive  on  whether  or  not  quasi-Newton  preconditioning  is  advantageous. 
For  the  problems  of  interest  to  this  thesis,  quasi-Newton  preconditioning  is  almost 
essential  for  solving  large  problems. 

In  the  case  that  Hessian  information  is  unavailable,  quasi-Newton  approximations 
can  be  used  to  generate  quadratic  approximations  of  Jq(z).  In  this  case,  one  can  use 
the  quasi-Newton  approximation  of  the  Hessian  and  the  inverse  Hessian  to  construct 
a  double  dog  leg  approach  for  the  solution  of  (5.1.1)  [66].  This  approach  approximates 
the  step  by  constructing  a  piecewise  linear  path  between  the  Cauchy  point  and  the 
quasi-Newton  point  (double  dog  leg  curve).  If  the  quasi-Newton  point  is  within  the 
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trust  region,  then  the  algorithm  chooses  this  as  the  new  step.  If  the  quasi-Newton 
point  is  outside  of  the  trust  region,  then  the  step  is  chosen  as  the  point  for  which 
the  double  dogleg  curve  intersects  the  trust  region.  On  the  other  hand,  one  can 
also  use  the  quasi-Newton  approximation  of  the  Hessian  with  truncated  CG  to  solve 
the  subproblem  (5.1.1).  The  CG  iterations  in  this  approach  are  inexpensive  and  the 
CG  algorithm  gives  better  steps  when  the  quasi-Newton  point  is  outside  of  the  trust 
region. 


Chapter  7 


Numerical  Examples 


In  this  chapter,  I  will  present  a  variety  of  numerical  examples  demonstrating  the 
power  and  necessity  of  adaptivity  in  solving  optimization  problems  governed  by  PDEs 
with  uncertain  coefficients.  Throughout  this  section,  I  will  refer  to  ten  separate  op¬ 
timization  algorithms.  These  algorithms  are  denoted:  NEWTONCG,  NEWTONCG 
+  BFGS,  TRCG  HESS,  TRCG  HESS  +  BFGS,  TRCG  BFGS,  TRDOGLEG  BFGS, 
RTRCG  HESS,  RTRCG  HESS  +  BFGS,  RTRCG  BFGS,  and  RTRDOGLEG  BFGS. 
NEWTONCG  denotes  Newton-CG  using  a  high  fidelity  stochastic  collocation  dis¬ 
cretization  of  the  optimization  problem.  NEWTONCG  +  BFGS  is  Newton-CG  using 
limited  memory  BFGS  preconditioning.  TRCG  stands  for  trust  region  and  RTRCG 
stands  for  retrospective  trust  region  where  the  trust  region  subproblems  are  solved 
using  truncated  CG.  The  suffix  “HESS”  refers  to  using  Hessian  information  and  the 
addition  of  “+  BFGS”  denotes  the  use  of  limited  memory  BFGS  preconditioning.  The 
suffix  “BFGS”  denotes  the  use  of  limited  memory  BFGS  to  approximate  the  Hessian. 
Finally,  TRDOGLEG  BFGS  and  RTRDOGLEG  BFGS  refer  to  the  use  of  the  double 
dogleg  approach  combined  with  limited  memory  BFGS  Hessian  approximations  to 
approximately  solve  the  trust  region  subproblem. 


Ill 
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7.1  One  Dimensional  Optimal  Control 


The  one  dimensional  examples  in  this  section  all  correspond  to  the  distributed  control 
of  the  steady  heat  equation.  In  this  section,  I  will  present  three  examples  each  demon¬ 
strating  different  forms  of  randomness.  One  particularly  nice  feature  of  my  adaptive 
approach  is  that  Algorithm  5.3.1  exploits  anisotropy  in  the  stochastic  dependence 
of  the  state  and  adjoint  equations  (i.e.  dependence  on  the  different  directions  Tk). 
This  anisotropy  ultimately  can  reduce  the  number  of  PDE  solves  required  for  the 
computation  of  derivative  information. 

Let  the  physical  domain  be  the  bounded  interval,  Del,  and  let  T  C  MM  denote 
the  stochastic  image  space.  T  is  endowed  with  the  uniform  probability  density  p(y). 
Throughout  this  section,  I  will  consider  the  quadratic  control  problem 

'Jiz  ^  :=  \E  -v\\2L2{D)  +  f  INI l\D) 
where  u(y )  =  u(y,  z)  G  Hq(D)  for  all  y  G  T  solves  the  state  equation 

-e(y)-^(y,x)  =  f(y,x)  +  z(x),  ( y,x)eTxD  (7.1.1) 

u(y,x)  =  0,  ( y,x )  G  T  x  d D 


To  relate  this  to  the  test  problem  in  Section  2.1,  Z  =  L2(D),  V  =  W  = 

if_1(-D),  7f  =  L2(D ),  and  Q  is  defined  by 

(Q u,v)H-i{D)tHuD)=  /  u(x)v(x) dx. 

J  D 

The  first  two  examples  will  use  truncated  KL  expansion  diffusivity  coefficients. 
The  isotropic  nature  of  the  solution  u(y)  =  u(y,z )  of  (7.1.1)  is  characterized  by  the 
decay  of  the  eigenvalues  of  the  covariance  function  [106].  This  decay  is  not  sufficient 
to  characterize  the  dependence  of  the  optimization  problem  on  the  directions  Yk .  For 
simplicity,  I  will  consider  the  state  operator,  A (y)  G  £(V,  W)  for  y  G  T,  given  as  a 
truncated  KL  expansion,  i.e. 

M 

A(y)  =  A0  +  ^  ^f\~kAkyk 

k=\ 


113 


and  the  operator  form  of  the  state  equation  (7.1.1) 

My)u(y)  +  Bz  =  0. 

Define  the  deterministic  reduced  cost  functional 

1 

j(z;y)  :=j(u(y;z),z)  =  ^\\u(y]  z)  -  v\\2l2{D)  +  -\\z\\2L2{D) 

where  u(y)  =  u(y;z)  solves  the  state  equation  (7.1.1).  Fix  z  G  Z  and  define  the 
function  f(y)  :=  j(z;y).  One  can  write  the  second  order  Taylor  approximation  of 
f(y)  centered  around  y  —  E[y\  as 

f(y)  =  f(y)  +  V/(y)T(y  -y)  +  ^(y-  y)T^2f(y  +  t(y  -  y))(y  -  y)  for  t  G  (0, 1). 

Note  that  since  V f(y)  is  independent  of  y  G  T,  E[V f  (y)T (y  —  y)]  =  0.  Now,  one  can 
explicitly  write  down  the  Hessian  V2/(C)  as 

V2/(0  =  (dluj(u((-,z),z)dyu((,z))  +  duj(u((,z),z)(d*yu((,z)). 

The  derivatives,  dyu( (,  z)  and  dyyu((,  z),  can  be  computed  via  implicitly  differentiat¬ 
ing  (7.1.1).  The  first  partial  derivative  of  u  with  respect  to  ijk  solves 

MOdyku(C,z)  +  v/^fcAfc«(C,^)  =  o 

and  the  second  partial  derivative  solves 

A(0<9jfcWM(C,  z)  +  y/XjAMC,  z)  +  v/A/,AA  i4;//(c,.  z)  =  0. 

From  these  derivatives,  it  is  clear  to  see  that  the  elements  of  the  Hessian  matrix, 
V2/(C)  satisfy  the  upper  bound 

M  M 

I  J(z)  -j(u(y\z),z) |  =  | E[f(y)}  -  f(y) J  <  C  EE  y/  ^j^k- 

j= 1  k= 1 

This  simple  analysis  demonstrates  the  fact  that  even  if  the  operator  A (y)  is  anisotropic 
with  respect  to  the  dimensions  T }■  for  k  =  1, . . . ,  M,  the  objective  function  may  de¬ 
pend  isotropically  on  the  parameters  y  E  T. 
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Remark  7.1.1  When  considering  the  state  equation 

Au(y)  +  Bz  =  b  (y), 

where  A  is  deterministic  and  b (y)  is  given  as  a  truncated  KL  expansion,  a  similar 
analysis  can  be  performed.  In  this  case,  the  solution  u(y)  =  u(y,  z)  G  V  for  all  y  G  T 
depends  linearly  on  y  G  T .  Thus,  plugging  u(y )  into  the  quadratic  objective  function, 
j(u(y,z),z),  gives  a  quadratic  dependence  of  f(y)  =  j(z',y )  on  the  stochastic  variables 
y  G  T.  Since  this  dependence  is  quadratic,  f(y)  is  exactly  equal  to  its  second  order 
Taylor  polynomial  and  the  Hessian  is  constant  with  respect  to  y  G  T.  This  gives  the 
upper  bound 

M 

I  J(z)  ~j{u{y\z),z) |  =  | E[f{y)  -  f(y)\  <  C^\k. 

k=  1 

Hence,  the  decay  in  the  eigenvalues,  \k,  completely  controls  the  anisotropic  depen¬ 
dence  of  the  objective  function  on  y  G  T . 

7.1.1  An  Isotropic  Example 

This  example  is  presented  in  [28]  and  serves  as  an  example  that  does  not  result  in 
an  anisotropic  sparse  grid.  The  physical  domain  is  D  =  (—1,1)  and  the  diffusivity 
coefficients  are  defined  as  the  truncated  KL  expansion 

M  1 

e(y,x)  =  e0(x)  +  ^  -■ ek(x)yk 

k=  1  ^ 

where  6q(x)  =  2  and  ek(x)  are  the  L 2  normalized  Lagrange  polynomials  built  Gauss- 
Legendre  interpolation  knots.  The  random  variables,  yk,  are  assumed  to  be  uniformly 
distributed  on  T  =  [—1, 1]M.  The  regularization  parameter  is  a  =  10~6  and  the  target 
profile  is 

v(x)  =  2  +  sign(cos(7rx)). 

For  the  numerical  examples,  I  hx  M  —  4  terms  in  the  KL  expansion.  I  have 
discretized  this  optimal  control  problem  in  space  using  continuous  piecewise  linear 
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FEM  on  a  uniform  mesh  of  128  intervals.  Figure  7.1  depicts  the  stochastic  collocation 
error  in  the  optimal.  For  the  “true”  solution,  I  solve  the  optimal  control  problem  using 
a  level  eleven  isotropic  Smolyak  sparse  grid  (Q  =  271617).  Here,  the  red  line  has 
slope  v  =  1.7.  This  slope  was  computed  using  least  squares.  The  table  in  Figure  7.1 
contains  the  L2(D)  error  associated  with  different  levels  of  isotropic  Smolyak  sparse 
grids  (£)  and  the  corresponding  number  of  sparse  grid  cubature  knots  ( Q ).  In  this 
table,  “rate”  refers  to  the  slope  between  two  consecutive  points. 


Control  Error 


1 

Q 

Error 

Rate 

0 

l 

6.48111191e+00 

1 

9 

2.93765793e-01 

1.408 

2 

41 

6.53655636e-01 

0.991 

3 

137 

1.43160490e-03 

3.167 

4 

401 

2.55996314e-04 

1.603 

5 

1105 

7.80872629e-06 

3.443 

6 

2925 

5.13762335e-06 

0.430 

7 

7537 

1.10353993e-05 

-0.808 

8 

18945 

9.86769644e-07 

2.619 

Figure  7.1:  (Left)  Collocation  error  in  the  optimal  controls.  The  red  line  denotes 
the  least  squares  fit  for  the  collocation.  The  estimated  convergence  rate  is  v  =  1.7. 
(Right)  L 2  error  and  associated  rate  of  decrease  for  the  optimal  controls. 


To  use  the  adaptive  collocation  and  trust  region  framework  described  in  this  thesis, 
I  employ  level  five  isotropic  Smolyak  sparse  grids  built  on  one  dimensional  Clenshaw- 
Curtis  interpolation  knots  ( Q  =  1105)  for  the  high  fidelity  approximation  of  the 
objective  function.  In  Figure  7.2,  I  have  plotted  the  optimal  control  on  the  left  and 
the  expected  value  of  the  solution  to  the  state  equation  computed  using  the  optimal 
control.  In  addition,  I  have  added  one  and  two  standard  deviation  intervals  around 
the  expected  value  of  the  state.  Furthermore,  in  Table  7.1,  I  compare  ten  different 
algorithms  described  above. 
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Optimal  Control  Optimal  State 


Figure  7.2:  (Left)  Computed  optimal  control.  (Right)  Expected  value  of  optimal 
state  (blue  solid  line)  plus  one  (red  dashed  line)  and  two  (black  dashed  line)  standard 
deviations. 
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TR 

Adaptive 

PDE 

CP 

Reduction 

NEWTONCG 

7 

0 

224,315 

1,105 

NEWTONCG  +  BFGS 

6 

0 

114,920 

1,105 

1.95 

TRCG  HESS 

6 

88 

110,367 

769 

2.03 

TRCG  HESS  +  BFGS 

5 

89 

46,934 

809 

4.78 

TRCG  BFGS 

60 

71 

196,671 

721 

1.14 

TRDOGLEG  BFGS 

61 

71 

202,825 

721 

1.11 

RTRCG  HESS 

6 

88 

113,561 

769 

1.98 

RTRCG  HESS  +  BFGS 

5 

89 

48,630 

809 

4.61 

RTRCG  BFGS 

57 

88 

240,162 

769 

0.93 

RTRDOGLEG  BFGS 

59 

71 

234,543 

721 

0.96 

Table  7.1:  This  table  contains  the  total  number  of  outer  iterations  (TR),  the  total 
number  of  adaptive  steps  (Adaptive),  the  total  number  of  PDE  solves  (PDE),  the 
total  number  of  collocation  points  in  the  final  sparse  grid  (CP),  and  the  reduction 
factor  of  total  PDE  solves  required  by  the  specified  algorithm  versus  Newton-CG 
(Reduction). 
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7.1.2  A  Mildly  Anisotropic  Example 

In  this  example,  the  physical  domain  is  D  —  (0,1).  To  construct  the  diffusivity 
coefficients,  define  the  sets  ,  jj]  for  k  =  1, . . . ,  M.  The  diffusivity  parameter 

is  defined  as 

m  M  —  k 

e(y, x)  =  {(M  -  k  +  0.01)  +  —  yk}xsk(x)- 

k=  1 

This  type  of  diffusion  parameters  is  called  checkerboard  diffusion  and  is  often  used 
for  numerical  examples.  Each  interval  in  the  checkerboard  diffusion  parameter  has 
decreasing  importance  (i.e.  the  diffusivity  on  each  interval  is  scaled  by  This 

type  of  diffusion  was  studied  in  [39].  The  stochastic  image  space  is  T  :=  [0, 1]M 
endowed  with  the  uniform  distribution,  p(y )  =  1.  The  regularization  parameter  is  set 
to  a  =  10~6  and  the  desired  profile  is  given  by 

v(x)  =  2x  +  sin(27nr). 

I  have  discretized  this  optimal  control  problem  using  piecewise  linear  finite  ele¬ 
ments  on  a  uniform  mesh  of  128  intervals.  I  choose  M  =  4  random  variables  and,  for 
the  high  fidelity  approximation  of  the  objective  function,  I  use  a  level  five  isotropic 
Smolyak  sparse  grid  built  on  one  dimensional  Clenshaw-Curtis  interpolation  knots 
( Q  =  1105).  In  Table  7.2,  I  compare  the  ten  different  algorithms  described  above. 
This  table  clearly  demonstrates  the  advantage  of  using  my  adaptive  approach.  With 
no  preconditioning,  the  adaptive  approach  experiences  about  a  seven  fold  reduction 
in  the  number  of  PDE  solves  required  to  obtain  the  optimal  controls.  Adding  limited 
memory  BFGS  preconditioning  further  reduces  the  number  of  PDE  solves  required 
and  results  in  a  ten  fold  reduction.  Figure  7.3  depicts  the  computed  optimal  con¬ 
trol  (left)  and  the  expected  value  of  the  optimal  state  (blue  solid  line)  plus  one  (red 
dashed  line)  and  two  (black  dashed  line)  standard  deviation  intervals.  Figure  7.3 
clearly  demonstrates  the  mild  anisotropy  associated  with  this  optimization  problem. 
One  will  notice  that  the  standard  deviation  is  smaller  for  x  >  0  and  decreases  as  x 
tends  toward  one.  Figure  7.4  demonstrates  the  stochastic  collocation  discretization 
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error.  The  red  line  in  the  left  image  is  a  least  squares  fit  to  the  error  curve.  The  least 
squares  estimated  slope  is  u  —  3.5.  The  table  on  the  left  contains  the  L2(D)  error 
for  the  different  levels  of  isotropic  Smolyak  sparse  grid  and  their  associated  slopes 
between  two  consecutive  levels.  Finally,  Figure  7.5  depicts  the  optimal  controls  (left) 
corresponding  to  the  deterministic  substitute  problem  where  the  random  variables 
y  E  T  were  replaced  with  y  =  E[y\.  The  right  image  in  Figure  7.5  depicts  the  ab¬ 
solute  error  between  the  controls  computed  for  the  deterministic  problem  and  the 
controls  computed  for  the  stochastic  problem. 


TR 

Adaptive 

PDE 

CP 

Reduction 

NEWTONCG 

6 

0 

143650 

1105 

NEWTONCG  +  BFGS 

6 

0 

90610 

1105 

1.59 

TRCG  HESS 

6 

21 

20685 

185 

6.94 

TRCG  HESS  +  BFGS 

5 

27 

14058 

261 

10.22 

TRCG  BFGS 

41 

19 

59848 

153 

2.40 

TRDOGLEG  BFGS 

51 

18 

75347 

141 

1.91 

RTRCG  HESS 

6 

21 

21431 

185 

6.70 

RTRCG  HESS  +  BFGS 

5 

27 

14626 

261 

9.82 

RTRCG  BFGS 

41 

19 

64660 

153 

2.22 

RTRDOGLEG  BFGS 

51 

18 

81882 

141 

1.75 

Table  7.2:  Algorithm  comparison  for  checkerboard  diffusivity  one  dimensional  exam¬ 
ple. 


120 


Optimal  Control  Optimal  State 


Figure  7.3:  (Left)  Computed  optimal  control.  (Right)  Expected  value  of  optimal 
state  (blue  solid  line)  plus  one  (red  dashed  line)  and  two  (black  dashed  line)  standard 
deviations. 


Control  Error 


£ 

Q 

Error 

Rate 

0 

l 

2.09105168e+01 

1 

9 

2.09008865e+00 

1.048 

2 

41 

2.18699979e-02 

3.007 

3 

137 

3.18712385e-06 

7.322 

4 

401 

4.41271949e-08 

3.985 

5 

1105 

0.0 

Figure  7.4:  (Left)  Collocation  error  in  the  optimal  controls.  The  least  squares  fit 
red  line  has  slope  v  =  3.5.  (Right)  L2  error  and  associated  rate  of  decrease  for  the 
optimal  controls. 


121 


Optimal  Control  Control  Errors 


Figure  7.5:  (Left)  The  optimal  controls  for  the  deterministic  problem  with  y  G  T 
replaced  by  y  =  E\y\  (solid  black  line).  The  red  dashed  line  is  the  control  computed 
via  the  stochastic  problem.  (Right)  Errors  between  the  optimal  controls  for  the 
stochastic  problem  and  the  optimal  controls  for  the  mean  value  problem. 
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7.1.3  An  Anisotropic  Example 

This  example  is  motivated  by  subsurface  flow  control  through  fractured  media.  In 
particular,  I  am  interested  in  the  situation  where  the  location  of  the  fractures  is  un¬ 
certain.  Furthermore,  this  example  investigates  the  effects  of  discontinuous  diffusion 
parameters  on  the  convergence  of  the  stochastic  collocation  method.  The  diffusivity 
parameters  in  the  example  are  not  given  as  a  truncated  KL  expansion.  Further¬ 
more,  this  example  incorporates  a  stochastic  right  hand  side.  The  physical  domain  is 
D  =  (—1,1)  and  the  stochastic  image  space  is  T  =  [—0.1,  0.1]  x  [—0.5,  0.5]  endowed 
with  the  uniform  probability  density.  The  diffusion  parameter  is  defined  as 

e(y,x)  =  eiX(-i,yi)(x)  +  e2X(yi,i){x) 
with  Ei  =  0.1,  62  =  10  and  the  forcing  term  is  defined  as 

/(!M)  =  e-<-»>a. 

The  regularization  parameter  is  set  to  a  =  10~4  and  target  profile  is  v  =  1. 

I  discretized  the  state  equation  in  space  using  continuous  piecewise  linear  finite 
elements  on  a  uniform  mesh  of  IV  =  128  intervals.  In  order  to  obtain  accurate  results, 
the  discontinuity  in  e  should  be  aligned  with  mesh  vertices.  To  do  this,  each  sample 
point,  yi  G  [—0.1,  0.1],  can  be  added  as  a  vertex  to  the  mesh.  The  collocation  space 
is  built  on  one  dimensional  Gauss-Patterson  interpolation  knots.  The  high  fidelity 
collocation  discretization  is  performed  using  a  level  seven  isotropic  Smolyak  sparse 
grid  ( Q  =  1793).  The  optimization  results  are  depicted  in  Figure  7.6  and  Figure  7.7. 
Figure  7.6  illustrates  the  adaptive  sparse  grid  at  the  final  step  of  optimization.  The 
right  image  contains  the  active  (red)  and  old  (blue)  index  sets.  The  union  of  these 
two  sets  gives  the  generalized  sparse  grid  index  set,  X.  The  left  figure  depicts  the 
resulting  sparse  grid  of  collocation  points  corresponding  to  X.  From  this  index  set, 
the  anisotropy  associated  with  this  optimization  problem  is  clear;  that  is,  much  more 
refinement  is  necessary  in  the  yi  direction  as  opposed  to  the  y2  direction.  Figure  7.7 
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displays  both  the  optimal  controls  and  the  expected  value  of  the  state  plus  one  and 
two  standard  deviation  intervals.  The  iteration  history  for  the  trust  region  framework 
is  displayed  in  Table  7.3. 

Index  Set,  X  Sparse  Grid  Nodes 


C\J 


8 

7 

6 

5 

4 

3 

2 

1 


Full 
■  Old 
|  Active 


1  2  3  4.  5  6  7  8 

i . 


Figure  7.6:  (Left)  Generalized  sparse  grid  index  set.  The  red  blocks  denote  “active” 
indices  and  the  blue  blocks  denote  “old”  indices.  The  gray  blocks  denote  the  in¬ 
dices  in  the  isotropic  Smolyak  index  set  of  level  eight.  (Right)  Collocation  points 
corresponding  to  the  index  set  X  =  A  U  O. 


Figure  7.8  depicts  the  optimal  controls  (left)  corresponding  to  the  deterministic 
substitute  problem  where  the  random  variables  y  G  T  were  replaced  with  y  =  E[y\. 
The  right  image  in  Figure  7.8  depicts  the  absolute  error  between  the  controls  com¬ 
puted  for  the  deterministic  problem  and  the  controls  computed  for  the  stochastic 
problem.  Figure  7.9  depicts  the  stochastic  collocation  error  associated  with  different 
levels  of  the  isotropic  Smolyak  sparse  grid.  The  least  squares  estimated  convergence 
rate  (red  line)  is  u  —  0.7.  The  convergence  rate  is  severely  diminished  from  the  pre¬ 
vious  one  dimensional  example.  This  convergence  rate  is  expected  due  to  the  lack 
of  smoothness  in  the  state  and  adjoint  equations  with  respect  to  y\.  In  fact,  this 
convergence  rate  is  roughly  the  same  as  Monte  Carlo  (i.e.  v  =  0.5). 
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Optimal  Control  Optimal  State 


Figure  7.7:  (Left)  Computed  optimal  control.  (Right)  Expected  value  of  computed 
optimal  state  with  one  and  two  standard  deviation  intervals  added. 


k 

J  (^/c) 

U 

1 

A  k 

CG 

Adaptive 

CP 

1 

2.264427e-01 

1.855366e-02 

1.528843e+02 

1000 

5 

6 

1 

2 

1.509245e-01 

8.555873e-04 

4.918603e+01 

2500 

5 

7 

49 

3 

1.349774e-01 

6.153557e-05 

8.451017e+01 

5000 

10 

2 

385 

4 

1.346243e-01 

3.457096e-06 

1.502707e+01 

5000 

12 

2 

513 

5 

1.346233e-01 

2.368099e-07 

2.097707e+00 

5000 

12 

2 

545 

6 

1.346233e-01 

1.022377e-08 

1.196515e-01 

5000 

11 

2 

641 

7 

1.346233e-01 

5.780752e-10 

8.693302e-03 

5000 

13 

0 

769 

Table  7.3:  (Iteration  History)  k  is  the  number  of  trust  region  iteration,  J(zk)  is 
the  objective  function  value,  \\^  Jx{^k)\\z  is  the  model  gradient  norm  value,  || ||^ 
is  the  step  size,  A*,  is  the  trust  region  radius,  CG  is  the  number  of  CG  iterations, 
Adaptive  is  the  number  of  sparse  grid  adaptation  iterations,  and  CP  is  the  number 
of  collocation  points. 
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Optimal  Control  Control  Errors 


Figure  7.8:  (Left)  The  optimal  controls  for  the  deterministic  problem  with  y  £  L 
replaced  by  y  —  E[y]  (solid  black  line).  The  red  dashed  line  is  the  control  computed 
via  the  stochastic  problem.  (Right)  Errors  between  the  optimal  controls  for  the 
stochastic  problem  and  the  optimal  controls  for  the  mean  value  problem. 
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0.929 

Figure  7.9:  (Left)  Collocation  error  in  the  optimal  controls.  The  least  squares  fitted 
convergence  rate  is  v  =  0.7.  (Right)  L2  error  and  associated  rate  of  decrease  for  the 
optimal  controls. 
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7.2  Source  Inversion  Under  Uncertainty 

Source  inversion  problems  are  of  paramount  importance  to  many  fields  such  as  the 
monitoring  of  carbon  output.  In  the  deterministic  setting,  the  source  inversion  prob¬ 
lem  is  to  determine  the  location  and  magnitude  of  the  sources  from  observed  data. 
Typically,  these  observations  are  point  observations  and  the  physics  governing  the 
system  are  assumed  to  be  known.  I  will  consider  a  steady  state  source  inversion  prob¬ 
lem  where  the  governing  physics  are  ad vection- diffusion.  Let  D  =  (0,  l)d  C  for 
d  =  2,  3  be  the  physical  domain.  The  PDE  describing  the  physical  system  is 

—eAv  +  V  •  Vw  —  z  in  D  (7-2.1) 

Vw  ■  n  —  0  on  dD n 
v  —  g  on  <9.Dd 

where  8Dn  denotes  the  outflow  boundary  and  dDo  denotes  the  Dirichlet  boundary. 
The  source  inversion  problem  with  Tikhonov  regularization  can  be  written  as 

min  j(z)  :=  -  \v(xk;  z)  -  vk\2  +  ^  \\z\\% .  (7.2.2) 

zez  z  L '  z 

k= 1 

where  v  =  v(z)  solves  (7.2.1).  This  optimization  problem  is  an  instance  of  the  test 
problem  from  Section  2.1  with  Z  =  L2(D ),  V  =  He(D),  W  =  V*,  7 H  =  M0  with  the 
Euclidean  inner  product,  q  —  v  e  M0,  and  Q  G  C(He(D),M.N)  is  defined  as  the  point 
evaluation  operator 

Qy  =  (u(xi), . . . ,  v(.r0))  • 

Note  that  Q  is  a  continuous  linear  operator  if  £  >  0  and  —t  <  —  d  +  |.  Q  is 
also  continuous  if  V  =  Hl(D)  fl  C°(D).  I  will  pose  the  optimization  problem  in 
V  =  H1(D)  and  discretize  (7.2.1)  using  continuous  finite  elements.  Therefore,  the 
discretized  solution  of  (7.2.1)  is  a  member  of  Hl(D)  fl  C°(D)  and  Q  is  well  defined. 

A  popular  alternative  to  solving  the  deterministic  problem  (7.2.2),  is  to  use 
Bayesian  inference.  In  the  Bayesian  framework,  one  assumes  there  is  some  statistical 
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noise  associated  with  the  model  (7.2.1)  and  the  observations,  v.  Incorporating  this 
noise  and  prior  knowledge  of  the  sources  allows  one  to  write  down  Bayes’  rule  for  the 
conditional  probability.  This  framework  is  highly  subjective  as  the  optimal  solution  is 
based  on  prior  knowledge.  To  circumvent  this  subjectivity,  I  have  formulated  (7.2.2) 
as  a  statistical  inverse  problem  by  assuming  random  held  advection  coefficients,  V. 
This  is  a  frequentist  approach  to  statistical  inverse  problems  for  which  I  can  apply 
my  adaptive  stochastic  collocation  and  trust  region  framework.  Furthermore,  I  will 
focus  on  the  expected  value  risk  measure,  cr(Y)  =  E\Y]. 

I  discretize  the  advection-diffusion  equation  (7.2.1)  using  streamline  upwind  Petrov- 
Galerkin  (SUPG)  stabilized  continuous  piecewise  linear  finite  elements  built  on  a  uni¬ 
form  mesh  of  quadrilaterals.  My  SUPG  implementation  is  based  on  [47].  The  side 
length  of  the  quadrilaterals  used  will  be  denoted  by  h.  Let  V/,,  =  span{0i, . . . ,  4>n}  C 
V  denote  the  FEM  basis.  The  linear  operators  A (y)  and  B(y)  from  Section  2.1  are 
A (y)  G  M7VxJV  and  B  G  WNxN  and  are  defined  via  the  SUPG  finite  element  discretiza¬ 
tion  as 


A  ij(y) 


eV 4>i(x)  ■  V(pj(x)  +V(y,x)  ■  \/<f)i(x)(j)j(x) dx 


+  r(h )  [  (V(y)  •  V0j(x))(V(y)  •  V(pj(x))dx 
J  D 

/  4>i(x)(j)j(x)dx  -  r(h)  /  (f>i(x)(V(y)  ■  V(f>j(x))dx. 

J  D  J  D 

Here,  r  =  r(h)  denotes  the  SUPG  parameter.  Furthermore,  the  Riesz  operator  R 
corresponding  to  the  inner  product  on  Z  is  written  as  R  G  M7VxAr  such  that 


Rq  =  /  4>i(x)4>j(x)dx. 


I D 


The  inverse  problem  (7.2.2)  is  typically  mesh  dependent  and  convergence  varies  ac¬ 
cording  to  mesh  size.  To  overcome  this  mesh  dependence,  I  use  the  discretized  objec¬ 
tive  function 


K 2 


o 


Jqh(z)  =  —  ^EQ[\uh(xk]z) 


%|2] 


a , 


k=\ 


+  -^\\A\z 


128 


where  Eq  denotes  the  quadrature  approximation  to  the  expected  value,  z  =  Yln=i  zn(pn 
(otz  =  (zi, . . .  ,zn)t  G  RN,  and  uh(y)  =  uh(y;z)  =  Y^n=i  un<\>n  and  14  =  (iq, . .  .,uN)T  G 
Rn  solves 

My)tih(y)  +  &{y)z  =  o. 

The  scaling  h2  to  the  mismatch  term  in  the  objective  function  gives  a  scale  indepen¬ 
dent  regularization  term. 

I  have  implemented  this  problem  using  the  Euclidean  inner  product  and  the  L2(D) 
inner  product  defined  on  the  discretized  control  space.  The  L2(D)  inner  product  on 
the  discretized  control  space  corresponds  to  the  inner  product 

(. z ,  s)z  =  TtRs 

where  z  =  J2n=i  zn(pn  and  s  =  J2n=i  sn4>n  for  z—  (zi, zn)t  and  s—  (si, . . . ,  sN)T  G 
RN .  In  the  L2(D )  inner  product  the  gradient  is 

VJQh(z)  =  OiZ  +  RT^gfBVh], 

while  in  the  Euclidean  inner  product  the  gradient  is 

WJQh(z)  =  aRT  +  EQ[B*ph]. 

In  both  cases,  ph  denotes  the  vector  of  coefficients  corresponding  to  the  solution  to 
the  finite  element  approximation  to  the  adjoint  equation 

A *{y)ph  +  (Q*Q uh  -  v)  =  0 

where  Q  G  Mc>xJV  is  the  observation  operator  (i.e.  part  of  an  identity  matrix  scaled 
by  h  if  the  observations  {xk]k=\  correspond  to  mesh  vertices). 

7.2.1  Two  Dimensional  Source  Inversion 

In  two  dimensions  (d  =  2),  the  Neumann  and  Dirichlct  boundaries  are 


dDN  =  {1}  x  [0, 1]  and  dDn  =  3D\  dDN. 
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and  the  inhomogeneous  Dirichlet  conditions  are 

g(y,  x)  =  o  for  x  g  (o,  l)  x  {o,  l}, 
and  on  {x  G  d Dd  :  x  G  {0}  x  (0, 1)} 

g{y,x)  =  di(y1,x2) 


where  d\  :  [—1, 1]  x  (0, 1)  — >  [0, 1]  is  dehned  as 

0  if  C  i  (0.25,0.75), 

di(7,C)  =  sin(27r(C—  :0.25))  if  C  G  (0.25,  0.75)  and  7  =  0,  (7.2.3) 

k  sin  (yr cxp(tlpC(2 °) -5i 3 ~ 1 } )  if  C  e  (0.25,  0.75)  and  7^0. 

The  diffusivity  parameter  is  deterministic  and  set  to  e  =  1CF2  and  the  two  dimensional 

random  advection  held  is 


V(y,x) 


M 


To Vk 
k 


e 


cos (0y2) 
sm(dy2) 


where  70  =  0.05,  71  =  0.0833,  7 k  =  ,kZ \)2  for  k  —  3, . . , ,  M,  x  =  0.5,  and  9  = 
Furthermore,  the  random  vector,  y  G  Y  :=  [—1, 1]M,  is  uniformly  distributed  with 
joint  density,  p(y )  = 

The  true  sources  and  observed  data  are  depicted  in  Figure  7.10.  For  this  ex¬ 
ample,  there  are  fifteen  true  sources  that  are  randomly  distributed  throughout  the 
subdomain  [0.3, 1]  x  [0, 1]  with  random  magnitudes  and  widths.  The  observed  state 
is  computed  by  solving  the  state  equation  with  the  random  variable  y  G  T  replaced 
with  y  =  E[y\  =  0.  Point  observations  of  the  observed  state  are  taken  at  each  mesh 
vertex.  The  computational  mesh  used  is  a  uniform  64  by  64  mesh  of  quadrilaterals. 
The  observed  state  and  stochastic  state  are  solved  on  this  mesh  using  continuous 
piecewise  linear  finite  elements.  For  the  numerical  results  presented  here,  M  =  5  and 
the  collocation  points  are  taken  to  be  level  four  isotropic  Smolyak  sparse  grid  knots 
built  on  one  dimensional  Clenshaw-Curtis  interpolation  knots.  Figure  7.11  depicts  the 
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computed  sources  and  the  resulting  expected  value  of  the  state.  Both  Figure  7.10  and 
Figure  7.11  only  display  the  sources  in  the  subdomain  [0.3, 1]  x  [0, 1].  The  computed 
sources  in  the  full  domain  are  depicted  in  Figure  7.12.  Notice  that  there  are  oscilla¬ 
tions  near  the  Dirichlet  boundary  x\  —  0.  This  phenomenon  is  due  the  discrepancy 
between  the  inhomogeneous  Dirichlet  conditions  E[g(y,x)\  and  g(0,x)  (i.e.  the  dis¬ 
crepancy  between  the  stochastic  state  equation  and  the  deterministic  observed  state 
equation).  The  difference  in  the  Dirichlet  boundary  conditions  can  be  seen  in  the  top 
right  image  in  Figure  7.12.  One  will  also  notice  in  the  bottom  image  of  Figure  7.12 
that  the  uncertain  Dirichlet  conditions  are  a  main  source  of  variability  in  the  state 
equation.  The  adapted  sparse  grid  index  set  for  this  2D  source  inversion  problem  is 
displayed  in  Figure  7.13.  Figure  7.13  characterizes  the  anisotropy  associated  with  the 
random  variables  in  this  problem.  The  top  left  image  shows  isotropy  and  a  strong 
dependence  on  the  first  two  directions  yi  and  y2 .  Directions  y\  and  y2  obtain  the 
maximum  level  for  the  sparse  grid  (i.e.  t  =  5).  These  directions  correspond  to  the 
Dirichlet  condition  and  the  angle  of  advection.  The  subsequent  images  demonstrate 
the  decreasing  importance  of  the  random  variables  y 3,  y 4,  and  2/5,  which  all  correspond 
to  the  advection  amplitude. 
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True  Sources 


Observed  State 


Figure  7.10:  (Left)  True  sources.  (Right)  Observed  state  computed  by  solving  the 
state  equation  with  y  —  0. 


Computed  Sources 


Expected  Value  of  State 


Figure  7.11:  (Left)  Computed  sources.  (Right)  Expected  value  of  optimal  state. 
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Computed  Sources 


Dirichlet  Condition 


Standard  Deviation 


Figure  7.12:  (Left)  Computed  sources.  (Right)  The  expected  value  of  the  inhomo¬ 
geneous  Dirichlet  condition  (black)  and  the  Dirichlet  condition  evaluated  at  y  —  0. 
This  difference  accounts  for  the  hot  spot  near  the  boundary  X\  —  0  in  the  left  im¬ 
age.  (Bottom)  Standard  deviation  of  the  state.  Notice  that  most  variation  is  due  to 
Dirichlet  conditions. 
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Directions  i i,  i2,  D 
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Directions  i2,  *3,  ii 
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1  ^4;  ^5 


Figure  7.13:  The  final  adapted  sparse  grid  index  set  for  2D  source  inversion. 
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7.2.2  Three  Dimensional  Source  Inversion 

In  three  dimensions  (d  =  3),  the  Neumann  and  Dirichlet  boundaries  are 
dDN  =  {1}  x  [0, 1]  x  [0, 1]  and  dDD  =  dD  \  dDN. 
and  the  inhomogeneous  Dirichlet  conditions  are 

g(y, x)  =  0forxe  (o,  l)  x  (o,  l)  x  (o,  1}  u  (0, 1)  x  {0, 1}  x  (0, 1), 
and  on  {a;  G  dD^  '■  x  G  {0}  x  (0, 1)  x  (0, 1)} 

g(y,x)  =  d1(y1,x2)d1(y2,x3 ) 

where  ^1(7,  C)  is  defined  in  (7.2.3).  The  diffusivity  parameter  is  deterministic  and  set 
to  e  =  10”2  and  the  two  dimensional  random  advection  field  is 

cos (0y3) 
sin  (dy3) 

0 

where  70  =  0.05,  71  =  0.0833,  jk  =  for  k  —  4, . . . ,  M,  x  =  0.5,  and  6  = 

Furthermore,  the  random  vector,  y  G  Y  :=  [—1, 1]M,  is  uniformly  distributed  with 
joint  density,  p(y)  =  pr. 

The  true  sources  and  observed  data  are  depicted  in  Figure  7.14.  As  in  the  2D 
source  inversion  example,  there  are  fifteen  true  sources  that  are  randomly  distributed 
throughout  the  subdomain  [0.3, 1]  x  [0.1]  x  [0, 1]  with  random  magnitudes  and  widths. 
The  observed  state  is  computed  by  solving  the  state  equation  with  the  random  variable 
y  G  T  replaced  with  y  —  E\y\  =0.  Point  observations  of  the  observed  state  are  taken 
at  each  mesh  vertex.  The  computational  mesh  used  is  a  uniform  32  by  32  by  32  mesh 
of  hexahedron.  The  observed  state  and  stochastic  state  are  solved  on  this  mesh  using 
continuous  piecewise  linear  finite  elements.  For  the  numerical  results  presented  here, 
M  =  6  and  the  collocation  points  are  taken  to  be  level  two  isotropic  Smolyak  sparse 
grid  knots  built  on  one  dimensional  Clenshaw-Curtis  interpolation  knots.  Figure  7.15 


O2 -x)2  M 


V(y,x)  =  (e  ^ 


7o  Vk 


(x2-x)2 
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depicts  the  computed  sources  in  the  upper  right  image,  contour  lines  of  the  expected 
value  of  the  state,  and  contour  lines  of  the  standard  deviation  of  the  state.  Notice 
that  there  are  oscillations  near  the  Dirichlet  boundary  x\  =  0  as  was  the  case  in 
the  2D  source  inversion  example.  This  phenomenon  is  due  the  discrepancy  between 
the  inhomogeneous  Dirichlet  conditions  E\g(y,x)\  and  g(0,x)  (i.e.  the  discrepancy 
between  the  stochastic  state  equation  and  the  deterministic  observed  state  equation). 

True  Sources  Observed  State 
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0.9 
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0.6 

0.5 
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Figure  7.14:  (Left)  True  sources.  These  sources  are  plotted  using  an  isosurface  of 
z  =  0.2.  (Right)  Observed  state  computed  by  solving  the  state  equation  with  y  =  0. 
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Computed  Sources 
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Figure  7.15:  Isosurface  of  z  —  0.2  for  the  computed  sources  (left),  contours  of  the 
expected  value  of  the  optimal  state  (right),  and  contours  of  the  standard  deviation  of 
the  optimal  state  (bottom).  Notice  the  phenomenon  near  the  inhomogeneous  Dirichlet 
boundary  in  the  upper  left  figure.  This  “fake  source”  is  due  to  the  difference  between 
E[g(y,x)\  and  <7(0,0;)  as  in  the  2D  source  inversion  example. 
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Conclusions  and  Future  Work 


I  have  presented  in  the  thesis  a  general  problem  formulation  for  equality  constrained 
optimization  problems  where  the  equality  constraint  depends  on  random  inputs.  This 
general  formulation  includes  many  interesting  optimization  problems  such  as  the  risk- 
averse  optimal  control  and  design  of  PDEs  with  uncertain  coefficients.  The  main 
contributions  of  my  work  are  the  analysis  and  algorithms  developed  for  the  risk- 
averse  optimization  of  PDEs  with  uncertain  coefficients.  I  have  developed  a  sparse 
grid  stochastic  collocation  discretization  scheme  for  these  optimization  problems  and 
extended  error  bounds  from  the  theory  of  stochastic  collocation  for  PDEs  with  uncer¬ 
tain  coefficients  to  the  case  of  optimization.  Furthermore,  I  have  employed  general¬ 
ized  sparse  grids  to  obtain  efficient  discretizations  of  these  optimization  problems  and 
have  proven  certain  interpolation  and  approximation  properties  for  these  generalized 
sparse  grids  operators.  I  have  developed  an  adaptive  stochastic  collocation  framework 
for  the  efficient  solution  of  optimization  problems  governed  by  PDEs  with  uncertain 
coefficients.  This  adaptive  framework  utilizes  adaptive  sparse  grids  to  generate  inex¬ 
pensive  approximate  models  and  guides  adaptivity  using  the  trust  region  algorithm. 
I  have  applied  this  framework  using  both  the  basic  trust  region  algorithm  and  the 
retrospective  trust  region  algorithm.  For  the  retrospective  trust  region  algorithm,  I 
have  proven  global  first  order  convergence  under  a  weakened  condition  on  gradient 
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exactness.  Finally,  I  have  implemented  this  trust  region  framework  employing  trun¬ 
cated  conjugate  gradients  (CG)  to  solve  the  trust  region  subproblem.  To  increase  the 
efficiency  of  CG  and  further  reduce  the  number  of  PDE  solves  required  at  every  trust 
region  iteration,  I  employ  automatic  preconditioning  using  limited  memory  BFGS 
inverse  Hessian  approximations. 

This  thesis  is  focused  on  the  efficient  solution  of  the  reduced  space  problem  (2.2.1) 
using  adaptive  sparse  grid  collocation.  The  trust  region  algorithm  that  I  have  devel¬ 
oped  here  is  not  limited  to  adaptivity  in  the  stochastic  dimension,  but  can  also  be 
extended  to  spatial  adaptivity  via  finite  elements  and  even  model  order  reduction 
adaptivity  for  time  dependent  problems  via  adaptive  snapshot  selection  for  projec¬ 
tion  based  reduced  order  modeling.  Although  the  method  I  have  developed  is  tied 
to  non-intrusive  methods,  one  may  be  able  to  extend  adaptive  polynomial  chaos  and 
Taylor  series  approximation  methods  to  the  optimization  context  using  this  trust  re¬ 
gion  framework.  These  additional  outlets  for  adaptivity  are  yet  to  be  studied  and  are 
natural  extensions  of  my  doctoral  work. 

The  incorporation  of  risk  measures  brings  about  many  issues  for  analysis,  com¬ 
putation,  and  algorithms,  but  allows  for  explicit  handling  of  the  risk  or  variation 
associated  with  each  design  or  control.  Risk  measures  are  a  natural  means  of  quan¬ 
tifying  tail  value  risk  in  engineering  design  and  safety  analysis  where  the  goal  is  to 
determine  a  design  that  will  withstand  extreme  and  rare  events.  Convergence  analysis 
is  a  first  step  to  incorporating  risk  measures  in  an  optimization  scheme.  It  is  essential 
to  know  that  the  controls  computed  via  a  discretization  of  (2.2.1)  do  converge  to  the 
true  controls  as  the  discretizations  are  refined.  This  is  a  challenging  task  as  most 
risk  measures  are  not  Frechet  differentiable.  This  convergence  analysis  can  be  per¬ 
formed  by  substituting  a  smooth  approximation  of  the  risk  measure.  Tracking  these 
discretization  and  approximation  errors  through  to  the  optimal  controls  is  essential 
for  accurate  quantification  of  risk  and  uncertainty. 

In  addition  to  risk  measures,  I  would  like  to  incorporate  general  constraints  into 
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my  optimization  formulation.  In  particular,  I  would  like  to  tackle  chance  constraints. 
Chance  constraints  offer  a  natural  means  of  accurately  estimating  operating  ranges 
and  fault  tolerances  for  certifying  engineering  components  in  control  and  design  prob¬ 
lems.  As  mentioned  earlier,  chance  constraints  may  cause  issues  for  gradient  based 
algorithms  since  they  are  not  necessarily  Frechet  differentiable.  These  constraints  can 
be  handled  in  a  similar  way  to  risk  measure  via  smooth  approximation.  Again,  these 
approximation  errors  must  be  tracked  through  to  the  optimal  control  values.  The  ad¬ 
dition  of  general  constraints  is  beyond  the  scope  of  my  trust  region  framework.  The 
algorithm  I  have  proposed  works  for  unconstrained  reduced  space  problems,  (2.2.1). 
In  the  presence  of  state  and  control  constraints,  my  trust  region  algorithm  must  be 
extended.  I  plan  to  extend  my  algorithmic  capabilities  to  the  full  space  and  deter¬ 
mine  an  appropriate  manner  of  incorporating  adaptivity.  Such  adaptive  full  space 
algorithms  have  been  considered  in  the  case  of  finite  element  adaptivity  in  [127]. 

Risk  measures  and  chance  constraints  are  problem  formulation  issues  and  are 
dictated  by  the  desired  application.  A  main  concern  of  these  optimization  problems 
is  algorithmic  efficiency.  In  the  case  of  time  dependent  problems,  some  form  of  model 
order  reduction  is  critical  in  making  the  numerical  solution  of  such  problems  feasible. 
I  am  interested  in  coupling  projection  based  model  order  reduction  techniques  with 
my  stochastic  collocation  framework.  In  this  case,  a  fixed  projection  basis  (i.e.  fixed 
snapshots)  may  not  be  necessary.  In  fact,  the  basis  can  be  built  adaptively  in  the 
same  manner  as  the  sparse  grid  using  the  inexact  gradient  condition,  (5.1.3).  Another 
approach  to  incorporating  model  order  reduction  is  to  fix  the  projection  basis  by  a 
greedy  sampling  of  the  parameter  space  to  choose  “optimal”  snapshots  [31]. 
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